Recognized HTML document

Correlation and Application of Statistics to Problems of Heredity 7

all. It is not possible to say whether the observed " reversion " was due to the weight of a single seed not representing the true maternal character, to the hypothesis of self-fertilisation not being correct or to other causes. Theoretically the important point is that Galton reached linear regression as a first feature of his correlation table. The next point Galton reached was the homoscedasticity or equal variability of the arrays of daughter seeds corresponding to a given mother seed*. " I was certainly astonished to find the family variability of the produce of the little seeds to be equal to that of the big ones ; but so it was, and I thankfully accept the fact; for if it had been otherwise, I cannot imagine, from theoretical considerations, how the typical problem could be solved" (p. 10).

The second logical stage in Galton's analysis is mathematical ; he endeavours, assuming that the population is stable and is distributed normally, to find what relation must exist between the "reversion" coefficient and

* Thus far I have not been able to find Galton's data for the weights of sweet-peas in the Galtoniana here. It is not easy, however, to find a special topic in the mass of note-books and undated and unindexed papers. Quite possibly, however, he lent his measurements to somebody, as he lent many series of observations to myself. It would be interesting to see exactly the data from which he deduced the two fundamental principles of a normal bivariate distribution, i.e. the straight-line regression and the equivariability of the arrays. Galton gives the correlation table of filial and parental seeds in the Appendix, p. 226, of his Natural Inheritance for lengths not weights. This shows that the mean length and variability of the parent seeds were arbitrarily chosen, there being 70 of each. Further, in the table the offspring seeds are modified to show 100 in each array. We do not know therefore the true means or standard deviations of either parental or offspring populations. This does not, however, affect the determination of either means or standard deviations of arrays. I find in hundredths of an inch:

 Diameter of Parent Seed Mean Diameter Of Array of Fil ial Seeds Standard Deviation of the Array 21 17.26 1.988 20 17.07 1.938 19 16.37 1.896 18 16.40 2.037 17 16.13 1.654 16 16.17 1.594 15 15.98 1.763

My means do not agree with Galton's, possibly he found his before reducing his whole numbers to percentages. (It could not be by the distribution of the filial diameters "Under 15," as this would tend, I think, to reduce all his means below mine.) He does not give his array standard deviations nor the quartiles. However, on some such numbers as these Galton reached his results. The array means are not incompatible with a straight-line relation; the standard deviations suggest that the smaller parental seeds had offspring seeds of less variability than those of the larger seeds, rather than equivariability being the rule. This view might be modified if we knew the actual distribution of the filial seeds "Under 15." Many of these dwarf seeds I suspect were abortions, as their lumping up at the tail of the arrays really prevents the latter from being considered as "normal curves." Galton states (loc. cit. supra) that he had obtained confirmatory results for the foliage and length of pod; this indicates that his experiments must have been carried on for a second year, as he started only with the parental seed.