Life of Francis Galton by Karl Pearson Vol 3a : image 0074

Correlation and Application of Statistics to Problems of Heredity 55

Table V on Galton's p. 143 is noteworthy. In Column 3 we have the coefficients of correlation tabled under the now familiar symbol r. In Column 4 we have the values of ,/1-r2, to enable the Quartile of the arrays to be found. In Column 5 we have, placed one under the other, the two regression coefficients, and in Column 6 in the same manner the Quartiles of the arrays (i.e. '67449 a-x,/l -r2 and '67449o-,,/_1-e)'. Throughout, without referring

directly to the matter, Galton assumes linear regression and homoscedasticity, i.e. he is thinking in terms of the bivariate normal surface. Next he draws attention to the relation of his present work to his former work on heredity.

On the fifth line of p. 144, he has the words: " from 117 to 3 x 112 = 1 to 0.44, which is practically the same." This should read "from 7 to 3 x 112 = 1 to 0 which is identically the same," as it should be since it

expresses the coefficient of correlation found from the second regression line. Galton emphasises the importance of the reduction in the variability of the array, as measured by ,/ 1 - r2, and points out how this affects the efficiency of Bertillon's system of identification by anthropometric measurements. Bertillon had asserted that his measurements were independent .variates. A reference to Plate LII of our second volume will show that Galton had chosen several of Bertillon's " independent " measurements and determined their actual correlation.

Galton next outlines a method by which the influence of n variates on another might be determined. He suggests that after transmuting the variates w e should sum them, when the probable error of the sum would " be

if the variates were perfectly independent, and n if they were rigidly and perfectly related. The observed value would be almost always somewhere intermediate between these extremes, and would give the information that is wanted" (p. 145).

This would not, I believe, be a feasible method of approaching multiple correlation; it neglects the possibility of negative correlations, and does not provide for the influence on one variate of all the remainder. It is an attempt to obtain a sort of average value of the interli e of a system of n variates j'. I do not think that at this time Ga on ad realised the existence and importance of negative correlation. ~~

* A large proportion of values in the 5th and 6th columns have rather serious numerical errors, corrected by Galton on a copy of the paper in my possession. He also states thereon that he wishes to change the symbol r to p, presumably because he was thinking of it as the " correlation coefficient," not as the regression coefficient, when units are reduced to respective variabilities. The regression coefficient without reduction he bad termed rin his memoir on stature.

t Let x1, x2, ... x„ ... x„ be the n variates, and vl, o 2 f ... o , ... their standard deviations,

x,, a, f ... L,, ... x„ their means. Then if x = S(x, -we have

. 1 vx2=n +2S'(r, )

= n, if all the correlations r„• are zero,

= n + 2j n (n - 1) 70, if all the correlations are plus one. _

Hence a, = %/;i and n in the two cases respectively, as Galton says. But the actual value of vX

OCR Rendition - approximate