Your browser doesn't support javascript.
loading
Advancing Transcription Factor Binding Site Prediction Using DNA Breathing Dynamics and Sequence Transformers via Cross Attention.
Kabir, Anowarul; Bhattarai, Manish; Rasmussen, Kim Ø; Shehu, Amarda; Bishop, Alan R; Alexandrov, Boian; Usheva, Anny.
Afiliación
  • Kabir A; Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA.
  • Bhattarai M; Department of Computer Science, George Mason University, 4400 University Dr, 22030, VA, USA.
  • Rasmussen KØ; Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA.
  • Shehu A; Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA.
  • Bishop AR; Department of Computer Science, George Mason University, 4400 University Dr, 22030, VA, USA.
  • Alexandrov B; Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA.
  • Usheva A; Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544, NM, USA.
bioRxiv ; 2024 Feb 15.
Article en En | MEDLINE | ID: mdl-38293094
ABSTRACT
Understanding the impact of genomic variants on transcription factor binding and gene regulation remains a key area of research, with implications for unraveling the complex mechanisms underlying various functional effects. Our study delves into the role of DNA's biophysical properties, including thermodynamic stability, shape, and flexibility in transcription factor (TF) binding. We developed a multi-modal deep learning model integrating these properties with DNA sequence data. Trained on ChIP-Seq (chromatin immunoprecipitation sequencing) data in vivo involving 690 TF-DNA binding events in human genome, our model significantly improves prediction performance in over 660 binding events, with up to 9.6% increase in AUROC metric compared to the baseline model when using no DNA biophysical properties explicitly. Further, we expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (SELEX) and Protein Binding Microarray (PBM) datasets, comparing our model with established frameworks. The inclusion of DNA breathing features consistently improved TF binding predictions across different cell lines in these datasets. Notably, for complex ChIP-Seq datasets, integrating DNABERT2 with a cross-attention mechanism provided greater predictive capabilities and insights into the mechanisms of disease-related non-coding variants found in genome-wide association studies. This work highlights the importance of DNA biophysical characteristics in TF binding and the effectiveness of multi-modal deep learning models in gene regulation studies.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos