Recognized HTML document

Correlation and Application of Statistics to Problems of Heredity 5

the vast system of factorial genetics which has arisen from Mendel's peasand this even in the theory of heredity. We see now what Galton might have done, he might have provided us with data to check Johansen's later bean-weight experiments, he might have thrown light on the "pure line." He might possibly have reached the correlation coefficient instead of the regression slope in his first attempt to get a measure of correlation. Whatever he might have done, he reached the idea of regression before he reached that of the coefficient of correlation. As long as he was dealing with heredity in the same sex, the approximate equality of variabilities in the two generations preserved him from any great error.

Galton was driven to his second problem by Bertillon's system for the identification of criminals. Bertillon claimed, as I remember Dr Garson did at a much later date, that the measurements chosen were practically independent. Galton needed a criterion to show whether such measurements as head length, foot length, stature, etc. were or were not associated. He saw that the problem closely resembled that of heredity, but he was troubled by the fact that the slope of his regression line depended on the units in which its two component variables were measured. It was not till more than 13 years* after his first attack on the subject that Galton realised, namely in 1889 during a walk in Naworth Park, that the two problems were identical, provided each character were measured in its own variability as unit (see our Vol. ii, p. 393). With that provision the slope of the regression line becomes what we now term the coefficient of correlation. It is needful to realise this history of Galton's progress : namely that he reached regression and even the constancy of the array variabilities 12 to 14 years before he formulated his coefficient of correlation, in order to understand fully the sequence of his memoirs on this topic.

One further fact it is necessary to bear in mind in order to measure his achievements. He started like Quetelet from the normal curve as describing the deviations of a population or of any selected population, e.g. that of an array of offspring from a parent of given character. He did not start with a general definition of correlation and see whither that would lead him. His justification was that he was dealing with anthropometric characters or measurements on living forms whose deviations from type approximately followed this special law of distribution. Thus he naturally reached a straight regression line, and the constant variability for all arrays of one character for a given value of a second'. It was, perhaps, best for the progress of the correlational calculus that this simple special case should be promulgated first; it is so easily grasped by the beginner. But it has had the disadvantage that certain branches of science, as psychology for example, have rarely got further, and, without taking the trouble to apply tests, adopt linear

* In his Natural Inheritance, 1889, p. 79, Galton says his sweet-pea data were collected more than 10 years previously. His lecture at the Royal Institution, Feb. 1877, shows that he was then already in possession of sweet-pea data, and the first measurements seem to have been made in 1875.

t What we now term "homoscedasticity."