Your browser doesn't support javascript.
loading
Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics.
Thurimella, Kumar; Mohamed, Ahmed M T; Graham, Daniel B; Owens, Róisín M; La Rosa, Sabina Leanti; Plichta, Damian R; Bacallado, Sergio; Xavier, Ramnik J.
Afiliación
  • Thurimella K; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Mohamed AMT; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
  • Graham DB; Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.
  • Owens RM; School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
  • La Rosa SL; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Plichta DR; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
  • Bacallado S; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Xavier RJ; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
bioRxiv ; 2023 Oct 25.
Article en En | MEDLINE | ID: mdl-37961379
In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, the first tool that harnesses protein language model embeddings to build a deep learning framework that facilitates the annotation of CAZymes in metagenomic datasets. Our benchmarking results showed on average a higher F1 score (reflecting an average of precision and recall) on the annotated genomes of Bacteroides thetaiotaomicron, Eggerthella lenta and Ruminococcus gnavus compared to the traditional sequence homology-based method in dbCAN2. We applied our tool to a paired mother/infant longitudinal dataset and revealed unannotated CAZymes linked to microbial development during infancy. When applied to metagenomic datasets derived from patients affected by fibrosis-prone diseases such as Crohn's disease and IgG4-related disease, CAZyLingua uncovered CAZymes associated with disease and healthy states. In each of these metagenomic catalogs, CAZyLingua discovered new annotations that were previously overlooked by traditional sequence homology tools. Overall, the deep learning model CAZyLingua can be applied in combination with existing tools to unravel intricate CAZyme evolutionary profiles and patterns, contributing to a more comprehensive understanding of microbial metabolic dynamics.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos