Your browser doesn't support javascript.
loading
Metagenomic Geolocation Using Read Signatures.
Chappell, Timothy; Geva, Shlomo; Hogan, James M; Lovell, David; Trotman, Andrew; Perrin, Dimitri.
Afiliación
  • Chappell T; School of Computer Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, Australia.
  • Geva S; Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia.
  • Hogan JM; School of Computer Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, Australia.
  • Lovell D; School of Computer Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, Australia.
  • Trotman A; Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia.
  • Perrin D; School of Computer Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, Australia.
Front Genet ; 13: 643592, 2022.
Article en En | MEDLINE | ID: mdl-35295949
We present a novel approach to the Metagenomic Geolocation Challenge based on random projection of the sample reads from each location. This approach explores the direct use of k-mer composition to characterise samples so that we can avoid the computationally demanding step of aligning reads to available microbial reference sequences. Each variable-length read is converted into a fixed-length, k-mer-based read signature. Read signatures are then clustered into location signatures which provide a more compact characterisation of the reads at each location. Classification is then treated as a problem in ranked retrieval of locations, where signature similarity is used as a measure of similarity in microbial composition. We evaluate our approach using the CAMDA 2020 Challenge dataset and obtain promising results based on nearest neighbour classification. The main findings of this study are that k-mer representations carry sufficient information to reveal the origin of many of the CAMDA 2020 Challenge metagenomic samples, and that this reference-free approach can be achieved with much less computation than methods that need reads to be assigned to operational taxonomic units-advantages which become clear through comparison to previously published work on the CAMDA 2019 Challenge data.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Genet Año: 2022 Tipo del documento: Article País de afiliación: Australia Pais de publicación: Suiza

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Genet Año: 2022 Tipo del documento: Article País de afiliación: Australia Pais de publicación: Suiza