High-accuracy splice site prediction based on sequence component and position features.
Genet Mol Res
; 11(3): 3432-51, 2012 Sep 25.
Article
em En
| MEDLINE
| ID: mdl-23079837
Identification of splice sites plays a key role in the annotation of genes. Consequently, improvement of computational prediction of splice sites would be very useful. We examined the effect of the window size and the number and position of the consensus bases with a chi-square test, and then extracted the sequence multi-scale component features and the position and adjacent position relationship features of consensus sites. Then, we constructed a novel classification model using a support vector machine with the previously selected features and applied it to the Homo sapiens splice site dataset. This method greatly improved cross-validation accuracies for training sets with true and spurious splice sites of both equal and different proportions. This method was also applied to the NN269 dataset for further evaluation and independent testing. The results were superior to those obtained with previous methods, and demonstrate the stability and superiority of this method for prediction of splice sites.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Biologia Computacional
/
Sítios de Splice de RNA
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Limite:
Humans
Idioma:
En
Revista:
Genet Mol Res
Assunto da revista:
BIOLOGIA MOLECULAR
/
GENETICA
Ano de publicação:
2012
Tipo de documento:
Article
País de afiliação:
China
País de publicação:
Brasil