CNN-Peaks: ChIP-Seq peak detection pipeline using convolutional neural networks that imitate human visual inspection.

Oh, Dongpin; Strattan, J Seth; Hur, Junho K; Bento, José; Urban, Alexander Eckehart; Song, Giltae; Cherry, J Michael

Oh, Dongpin; Strattan, J Seth; Hur, Junho K; Bento, José; Urban, Alexander Eckehart; Song, Giltae; Cherry, J Michael.

Afiliación

Oh D; School of Computer Science and Engineering, Pusan National University, Busan, 46241, South Korea.
Strattan JS; Department of Genetics, Stanford University, Stanford, 94305, USA.
Hur JK; School of Medicine, Kyung Hee University, Seoul, 02447, South Korea.
Bento J; Department of Computer Science, Boston College, Chestnut Hill, Philadelphia, MA, 02467, USA.
Urban AE; Department of Genetics, Stanford University, Stanford, 94305, USA.
Song G; School of Computer Science and Engineering, Pusan National University, Busan, 46241, South Korea. gsong@pusan.ac.kr.
Cherry JM; Department of Genetics, Stanford University, Stanford, 94305, USA.

Sci Rep ; 10(1): 7933, 2020 05 13.

Article en En | MEDLINE | ID: mdl-32404971

RESUMEN

ChIP-seq is one of the core experimental resources available to understand genome-wide epigenetic interactions and identify the functional elements associated with diseases. The analysis of ChIP-seq data is important but poses a difficult computational challenge, due to the presence of irregular noise and bias on various levels. Although many peak-calling methods have been developed, the current computational tools still require, in some cases, human manual inspection using data visualization. However, the huge volumes of ChIP-seq data make it almost impossible for human researchers to manually uncover all the peaks. Recently developed convolutional neural networks (CNN), which are capable of achieving human-like classification accuracy, can be applied to this challenging problem. In this study, we design a novel supervised learning approach for identifying ChIP-seq peaks using CNNs, and integrate it into a software pipeline called CNN-Peaks. We use data labeled by human researchers who annotate the presence or absence of peaks in some genomic segments, as training data for our model. The trained model is then applied to predict peaks in previously unseen genomic segments from multiple ChIP-seq datasets including benchmark datasets commonly used for validation of peak calling methods. We observe a performance superior to that of previous methods.

Asunto(s)

Secuenciación de Inmunoprecipitación de Cromatina; Biología Computacional/métodos; Redes Neurales de la Computación; Programas Informáticos; Algoritmos; Sitios de Unión; Secuenciación de Inmunoprecipitación de Cromatina/métodos; Bases de Datos de Ácidos Nucleicos; Epigénesis Genética; Epigenómica/métodos; Histonas/metabolismo; Humanos; Motivos de Nucleótidos; Unión Proteica; Sitio de Iniciación de la Transcripción

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / Redes Neurales de la Computación / Biología Computacional / Secuenciación de Inmunoprecipitación de Cromatina Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Sci Rep Año: 2020 Tipo del documento: Article País de afiliación: Corea del Sur Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google