Accurate prediction of immunogenic T-cell epitopes from epitope sequences using the genetic algorithm-based ensemble learning.

Zhang, Wen; Niu, Yanqing; Zou, Hua; Luo, Longqiang; Liu, Qianchao; Wu, Weijian

Zhang, Wen; Niu, Yanqing; Zou, Hua; Luo, Longqiang; Liu, Qianchao; Wu, Weijian.

Afiliación

Zhang W; School of Computer, Wuhan University, Wuhan, 430072, China; Research Institute of Shenzhen, Wuhan University, Shenzhen, 518057, China.
Niu Y; School of Mathematics and Statistics, South-central University for Nationalities, Wuhan, 430074, China.
Zou H; School of Computer, Wuhan University, Wuhan, 430072, China.
Luo L; School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China.
Liu Q; School of Computer, Wuhan University, Wuhan, 430072, China.
Wu W; School of Computer, Wuhan University, Wuhan, 430072, China.

PLoS One ; 10(5): e0128194, 2015.

Article en En | MEDLINE | ID: mdl-26020952

RESUMEN

BACKGROUND: T-cell epitopes play the important role in T-cell immune response, and they are critical components in the epitope-based vaccine design. Immunogenicity is the ability to trigger an immune response. The accurate prediction of immunogenic T-cell epitopes is significant for designing useful vaccines and understanding the immune system. METHODS: In this paper, we attempt to differentiate immunogenic epitopes from non-immunogenic epitopes based on their primary structures. First of all, we explore a variety of sequence-derived features, and analyze their relationship with epitope immunogenicity. To effectively utilize various features, a genetic algorithm (GA)-based ensemble method is proposed to determine the optimal feature subset and develop the high-accuracy ensemble model. In the GA optimization, a chromosome is to represent a feature subset in the search space. For each feature subset, the selected features are utilized to construct the base predictors, and an ensemble model is developed by taking the average of outputs from base predictors. The objective of GA is to search for the optimal feature subset, which leads to the ensemble model with the best cross validation AUC (area under ROC curve) on the training set. RESULTS: Two datasets named 'IMMA2' and 'PAAQD' are adopted as the benchmark datasets. Compared with the state-of-the-art methods POPI, POPISK, PAAQD and our previous method, the GA-based ensemble method produces much better performances, achieving the AUC score of 0.846 on IMMA2 dataset and the AUC score of 0.829 on PAAQD dataset. The statistical analysis demonstrates the performance improvements of GA-based ensemble method are statistically significant. CONCLUSIONS: The proposed method is a promising tool for predicting the immunogenic epitopes. The source codes and datasets are available in S1 File.

Asunto(s)

Algoritmos; Epítopos de Linfocito T/química; Modelos Genéticos; Modelos Inmunológicos; Secuencia de Aminoácidos; Simulación por Computador; Conjuntos de Datos como Asunto; Epítopos de Linfocito T/inmunología; Humanos; Datos de Secuencia Molecular; Curva ROC; Linfocitos T/química; Linfocitos T/inmunología; Vacunas Sintéticas/biosíntesis

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Modelos Inmunológicos / Epítopos de Linfocito T / Modelos Genéticos Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: PLoS One Asunto de la revista: CIENCIA / MEDICINA Año: 2015 Tipo del documento: Article País de afiliación: China Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google