A high-performance SNP panel developed by machine-learning approaches for characterizing genetic differences of Southern and Northern Han Chinese, Korean, and Japanese individuals.
Electrophoresis
; 43(11): 1183-1192, 2022 06.
Article
en En
| MEDLINE
| ID: mdl-35297530
Population stratification analyses targeting genetically closely related East Asians have revealed that distinguishable differentiation exists between Han Chinese, Korean, and Japanese individuals, as well as between southern (S-) and northern (N-) Han Chinese. Previous studies offer a number of choices for ancestry informative single nucleotide polymorphisms (AISNPs) to discriminate East-Asian populations. In this study, we collected and examined the efficiency of 1185 AISNPs using frequency and genotype data from various publicly available databases. With the aim to perform fine-scale classification of S-Han, N-Han, Korean, and Japanese subjects, machine-learning methods (Softmax and Random Forest) were used to screen a panel of highly informative AISNPs and to develop a superior classification model. Stepwise classification was implemented to increase and balance the discrimination in the process of AISNP selection, first discriminating Han, Korean, and Japanese individuals, and then characterizing stratification between S-Han and N-Han. The final 272-AISNP panel is an alternative optimization of various previous works, which promises reliable and >90% accuracy in classification of the four East-Asian groups. This AISNP panel and the machine-learning model could be a useful and superior choice in medical genome-wide association studies and in forensic investigations for unknown suspect identity.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Polimorfismo de Nucleótido Simple
/
Genética de Población
Límite:
Humans
País/Región como asunto:
Asia
Idioma:
En
Revista:
Electrophoresis
Año:
2022
Tipo del documento:
Article
Pais de publicación:
Alemania