PROSO II--a new method for protein solubility prediction.

Smialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, Dmitrij

Smialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, Dmitrij.

Afiliación

Smialowski P; Department of Genome Oriented Bioinformatics, Technische Universität Muenchen, Freising, Germany. pawelsm@gmail.com

FEBS J ; 279(12): 2192-200, 2012 Jun.

Article en En | MEDLINE | ID: mdl-22536855

RESUMEN

Many fields of science and industry depend on efficient production of active protein using heterologous expression in Escherichia coli. The solubility of proteins upon expression is dependent on their amino acid sequence. Prediction of solubility from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in experimental data to improve coverage and accuracy of solubility predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer composition serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82 000 proteins). When tested on a separate holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coefficient 0.39, sensitivity 0.731, specificity 0.759, gain (soluble) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available experimental data set the PROSO II server constitutes a substantial improvement in protein solubility predictions. PROSO II is available at http://mips.helmholtz-muenchen.de/prosoII.

Asunto(s)

Inteligencia Artificial; Proteínas/química; Proteínas/clasificación; Solubilidad

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Inteligencia Artificial / Proteínas Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: FEBS J Asunto de la revista: BIOQUIMICA Año: 2012 Tipo del documento: Article País de afiliación: Alemania Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google