Your browser doesn't support javascript.
loading
Scalable CNN-based classification of selective sweeps using derived allele frequencies.
van den Belt, Sjoerd; Zhao, Hanqing; Alachiotis, Nikolaos.
Afiliación
  • van den Belt S; Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands.
  • Zhao H; Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands.
  • Alachiotis N; Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands.
Bioinformatics ; 40(Suppl 2): ii29-ii36, 2024 09 01.
Article en En | MEDLINE | ID: mdl-39230693
ABSTRACT
MOTIVATION Selective sweeps can successfully be distinguished from neutral genetic data using summary statistics and likelihood-based methods that analyze single nucleotide polymorphisms (SNPs). However, these methods are sensitive to confounding factors, such as severe population bottlenecks and old migration. By virtue of machine learning, and specifically convolutional neural networks (CNNs), new accurate classification models that are robust to confounding factors have been recently proposed. However, such methods are more computationally expensive than summary-statistic-based ones, yielding them impractical for processing large-scale genomic data. Moreover, SNP data are frequently preprocessed to improve classification accuracy, further exacerbating the long analysis times.

RESULTS:

To this end, we propose a 1D CNN-based model, dubbed FAST-NN, that does not require any preprocessing while using only derived allele frequencies instead of summary statistics or raw SNP data, thereby yielding a sample-size-invariant, scalable solution. We evaluated several data fusion approaches to account for the variance of the density of genetic diversity across genomic regions (a selective sweep signature), and performed an extensive neural architecture search based on a state-of-the-art reference network architecture (SweepNet). The resulting model, FAST-NN, outperforms the reference architecture by up to 12% inference accuracy over all challenging evolutionary scenarios with confounding factors that were evaluated. Moreover, FAST-NN is between 30× and 259× faster on a single CPU core, and between 2.0× and 6.2× faster on a GPU, when processing sample sizes between 128 and 1000 samples. Our work paves the way for the practical use of CNNs in large-scale selective sweep detection. AVAILABILITY AND IMPLEMENTATION https//github.com/SjoerdvandenBelt/FAST-NN.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Polimorfismo de Nucleótido Simple / Frecuencia de los Genes Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Países Bajos Pais de publicación: Reino Unido

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Polimorfismo de Nucleótido Simple / Frecuencia de los Genes Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Países Bajos Pais de publicación: Reino Unido