Machine learning provides the most accurate method for genomic prediction of height

LASSO algorithm using about 20 thousand variants in genome successfully predicted genomic height close to 4 centimeters, after training on 500 thousand genomes database.

Constructed predictor explains 40% of total variability of height. Its correlation with actual height reaches 0,65.

In contrast to common GWAS methods, the study utilizes machine learning algorithm to capture as much as possible variants. As researchers have written:

Recent estimates suggest that common SNPs account for significant heritability of complex traits such as height, heel bone density, and educational attainment (EA). Large genome-wide association studies (GWAS) of these traits have identified many statistically associated SNPs at genome-wide significance. However, the total variance accounted for by these SNPs is still a small fraction of the trait heritability and of the proportion of variance. (…) In the case of height, this gap is largely closed by our results since the squared-correlation captured by our predictor is close to the total estimated common SNP heritability of 0.5.

Out of maximum detected 100 thousand variants, only 20 thousand accounted for accurate prediction. Increase of involved variants to 50 or 100 thousand did not significantly improve results.

The model exceeds all previous approaches to height prediction, including GIANT consortium. However, at the moment the model is not publicly available.

More: “Accurate Genomic Prediction of Human Height”, L. Lello et al., 2018, doi:10.1534/genetics.118.301267.