Recognized HTML document

56   Life and Letters of Francis Galton

Galton sums up his results as follows *. Let x be the deviation of the subject, and y„ y2, y,, etc. the corresponding deviations of the correlative, all deviations being reduced to their proper unit of variability, and also let the mean of the y deviations for the given x be yx, then we find:

(1) That yz = rx for all values of x ; (2) that r is the same, whichever of the two variables is taken for the subject; (3) that r is always less than 1 ; (4) that r measures the closeness of correlation.

It will be seen at once that we have here the first fundamental statement as to the correlation coefficient and its properties. Probably Galton did not recognise that r = 0 does not signify independence of the two variates, only the independence of means of arrays. In addition to this, complete independence involves the arrays being similar and similarly placed curves. It was not till normal distributions were seen to be non-universal that the distinction between the vanishing of r and the absolute independence of variates was fully recognised. For the same reason the idea of non-linear regression did not cross Galton's mind.' He got as far as an acceptance of the normal frequency distribution permitted. Only when we look at what has happened since 1888, do we realise the importance of that short paper on "Co-relations" ! Thousands of correlation coefficients are now calculated annually, the memoirs and text-books on psychology abound in them ; they form, it may be in a generalised manner, the basis of investigations in medical statistics, in sociology and anthropology. Shortly, Galton's very modest paper of ten pages from which a revolution in our scientific ideas has spread is in its permanent influence, perhaps, the most important of his writings. Formerly the quantitative scientist could only think in terms of causation, now he can

would not be proportional to the sum of the rw even if they were all positive. Perhaps a better measure of the same type would be to use o X2, where

X=S(xs-x8)2/0,82 and X=n;

1

hence :   0,X2 = mean (X - X)2

=mean ~S (x8 - xy)'wx4 + 2S' (x8 - x8)2 8, - x,)2/_8 a,'

- 2n8(x8 -x8)2/0,8' + n21 =3n+2S'(1+2r88.2)-2n2+n2 = 2n + 48'(r 288),

if the variates follow normal distributions, and thus o- ' lies between 2n and 2n2. This at any rate would present no difficulty arising from the existence of negative correlations. We see, however, from this result that possibly the best measure, u, of the total correlativity in a system would be simply to take

u=

2S (r88,2)

n (n-1)'

for in this case u will always lie between 0 and 1, the former value corresponding to no association in the variates of the system, and the latter to perfect correlation of all of them.

* Galton has interchanged his x and y variates. The paper shows here as elsewhere signs of haste in preparation.