These are some general comments arising from work on Exercises that was marked recently. First, some people included the categorical factor gender in the PCA ---- but none ['correctly'] recoding the three level gender into two binary variables. It is certainly misleading to include the raw variable gender in the analysis since 3 codes for unknown and is not a quantitative value equidistant from category 2 with category 1, however, PCA is quite robust and you obtain much the same score plots on the PCs. It is better to do the pca on the raw linear measurements and there are good arguments for using the correlation matrix but the differences using the covariance are not substantial; the interpretation of the PCs is much clearer from the correlation matrix since with the covariance in looks as if just one variable dominates the first PC (this is also a signal that there may be very unequal variances amongst the original variables) and you should make allowance for that in interpretation; the scatterplots of the scores on the PCs look broadly similar whether using covaraince or correlation. I think one person transformed to log measurements which seems to produce slightly more interpretable PCs (especially 2 and 3) but if you do this it would be coherent to use the logs as well for the discrimination. Many people left out part 1(iii) (what if lda is performed on the PCs) and I hope that is now clear. It would be sensible to check empirically for yourself that it is true. Some people thought I was asking whether the plots on PCs and Crimcoords would look the same and some claimed that they would but that there would be a sign reversal --- this happens to happen with the Iris Data and the plots on PCs and Crimcoords are accidentally very similar. This is probably only true for the Iris Data and for no other data in the whole world, certainly not the Thai dogs. However Crimcoord plots from raw and from full set of PCs will be identical up to arbitrary reflections. A couple misunderstood Q2 and loooked instead at a classical scaling ---- if you have raw data then classical scaling can never tell you anything. Scaling methods are ONLY useful when the 'raw data' are actually [pseudo-]distances, or equivalent --- as with the languages example in lectures a couple of weeks ago. There are no [obvious] variables measured on each language but you can measure the similarity [/distance] between any pair of languages. Some people used predict.lda(.) correctly to project the data on prehistoric dogs into discriminant space (creating an object prehist.predicted say) and then extracted the best prediciton by prehist.predicted$class which said that all would be classified as type 1 (modern). This is indeed sensible but not quite what I wanted you to see --- this is indeed the best available prediction and when you plot on crimcoords 1 and 2 the prehistoric do indeed seem mixed up with the modern, but a plot on crimcoord 3 reveal a separation --- illustrating that there's no substitute for actually looking at the data instead of relying on the ['forced'] model that the prehistoric must be one of the groups for which you happen to have data. Some people said that LD3 contained "only 3 per cent of the discrimination" --- agreed the third eigenvalue is around three per cent of the sum of all the eigenvalues but the interpretation that it is 3 per cent of the discrimination is invalid since "discrimination" is not something that can be measured quantitatively (unlike variation). Further, even if 'discrimation. could be quantified it would not be valid to partition it into separate bits since the crimcoords are non-orthogonal --- so a total of "quantity of perc cents of quantified discrimination" woul dbe potentially more than 100. Thus it is not safe to attribute the separation of the prehistoric from the modern on crimcoord 3 to random noise (as is implied when saying "only 3% of discrimination and therefore we can discard it".)