Búsqueda | Portal Regional de la BVS

GOTA: GO term annotation of biomedical literature.

Di Lena, Pietro; Domeniconi, Giacomo; Margara, Luciano; Moro, Gianluca.

BMC Bioinformatics ; 16: 346, 2015 Oct 28.

Artículo en Inglés | MEDLINE | ID: mdl-26511083

RESUMEN

BACKGROUND: Functional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts. RESULTS: Here we introduce GOTA, a GO term annotator for biomedical literature. The proposed approach makes use only of information that is readily available from public repositories and it is easily expandable to handle novel sources of information. We assess the classification capabilities of GOTA on a large benchmark set of publications. The overall performances are encouraging in comparison to the state of the art in multi-label classification over large taxonomies. Furthermore, the experimental tests provide some interesting insights into the potential improvement of automated annotation tools. CONCLUSIONS: GOTA implements a flexible and expandable model for GO annotation of biomedical literature. The current version of the GOTA tool is freely available at http://gota.apice.unibo.it.

Asunto(s)

Interfaz Usuario-Computador , Animales , Minería de Datos , Ontología de Genes , Humanos , Internet , Anotación de Secuencia Molecular

Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure.

Vassura, Marco; Di Lena, Pietro; Margara, Luciano; Mirto, Maria; Aloisio, Giovanni; Fariselli, Piero; Casadio, Rita.

BioData Min ; 4(1): 1, 2011 Jan 13.

Artículo en Inglés | MEDLINE | ID: mdl-21232136

RESUMEN

BACKGROUND: The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone. METHODS: In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand. RESULTS: We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes. CONCLUSIONS: All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.

Is there an optimal substitution matrix for contact prediction with correlated mutations?

Di Lena, Pietro; Fariselli, Piero; Margara, Luciano; Vassura, Marco; Casadio, Rita.

IEEE/ACM Trans Comput Biol Bioinform ; 8(4): 1017-28, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-20855922

RESUMEN

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.

Asunto(s)

Biología Computacional/métodos , Modelos Estadísticos , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Mutación

Fast overlapping of protein contact maps by alignment of eigenvectors.

Di Lena, Pietro; Fariselli, Piero; Margara, Luciano; Vassura, Marco; Casadio, Rita.

Bioinformatics ; 26(18): 2250-8, 2010 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-20610612

RESUMEN

MOTIVATION: Searching for structural similarity is a key issue of protein functional annotation. The maximum contact map overlap (CMO) is one of the possible measures of protein structure similarity. Exact and approximate methods known to optimize the CMO are computationally expensive and this hampers their applicability to large-scale comparison of protein structures. RESULTS: In this article, we describe a heuristic algorithm (Al-Eigen) for finding a solution to the CMO problem. Our approach relies on the approximation of contact maps by eigendecomposition. We obtain good overlaps of two contact maps by computing the optimal global alignment of few principal eigenvectors. Our algorithm is simple, fast and its running time is independent of the amount of contacts in the map. Experimental testing indicates that the algorithm is comparable to exact CMO methods in terms of the overlap quality, to structural alignment methods in terms of structure similarity detection and it is fast enough to be suited for large-scale comparison of protein structures. Furthermore, our preliminary tests indicates that it is quite robust to noise, which makes it suitable for structural similarity detection also for noisy and incomplete contact maps. AVAILABILITY: Available at http://bioinformatics.cs.unibo.it/Al-Eigen.

Asunto(s)

Algoritmos , Proteínas/química , Biología Computacional/métodos , Conformación Proteica , Proteínas/fisiología

A graph theoretic approach to protein structure selection.

Vassura, Marco; Margara, Luciano; Fariselli, Piero; Casadio, Rita.

Artif Intell Med ; 45(2-3): 229-37, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-18786818

RESUMEN

OBJECTIVE: Protein structure prediction (PSP) aims to reconstruct the 3D structure of a given protein starting from its primary structure (chain of amino acidic residues). It is a well-known fact that the 3D structure of a protein only depends on its primary structure. PSP is one of the most important and still unsolved problems in computational biology. Protein structure selection (PSS), instead of reconstructing a 3D model for the given chain, aims to select among a given, possibly large, number of 3D structures (called decoys) those that are closer (according to a given notion of distance) to the original (unknown) one. In this paper we address PSS problem using graph theoretic techniques. METHODS AND MATERIALS: Existing methods for solving PSS make use of suitably defined energy functions which heavily rely on the primary structure of the protein and on protein chemistry. In this paper we present a new approach to PSS which does not take advantage of the knowledge of the primary structure of the protein but only depends on the graph theoretic properties of the decoys graphs (vertices represent residues and edges represent pairs of residues whose Euclidean distance is less than or equal to a fixed threshold). RESULTS: Even if our methods only rely on approximate geometric information, experimental results show that some of the adopted graph properties score similarly to energy-based filtering functions in selecting the best decoys. CONCLUSION: Our results highlight the principal role of geometric information in PSS, setting a new starting point and filtering method for existing energy function-based techniques.

Asunto(s)

Proteínas/química , Conformación Proteica

Reconstruction of 3D structures from protein contact maps.

Vassura, Marco; Margara, Luciano; Di Lena, Pietro; Medri, Filippo; Fariselli, Piero; Casadio, Rita.

IEEE/ACM Trans Comput Biol Bioinform ; 5(3): 357-67, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18670040

RESUMEN

The prediction of the protein tertiary structure from solely its residue sequence (the so called Protein Folding Problem) is one of the most challenging problems in Structural Bioinformatics. We focus on the protein residue contact map. When this map is assigned it is possible to reconstruct the 3D structure of the protein backbone. The general problem of recovering a set of 3D coordinates consistent with some given contact map is known as a unit-disk-graph realization problem and it has been recently proven to be NP-Hard. In this paper we describe a heuristic method (COMAR) that is able to reconstruct with an unprecedented rate (3-15 seconds) a 3D model that exactly matches the target contact map of a protein. Working with a non-redundant set of 1760 proteins, we find that the scoring efficiency of finding a 3D model very close to the protein native structure depends on the threshold value adopted to compute the protein residue contact map. Contact maps whose threshold values range from 10 to 18 Angstroms allow reconstructing 3D models that are very similar to the proteins native structure.

Asunto(s)

Modelos Químicos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/ultraestructura , Sitios de Unión , Simulación por Computador , Unión Proteica , Conformación Proteica , Pliegue de Proteína

FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps.

Vassura, Marco; Margara, Luciano; Di Lena, Pietro; Medri, Filippo; Fariselli, Piero; Casadio, Rita.

Bioinformatics ; 24(10): 1313-5, 2008 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-18381401

RESUMEN

UNLABELLED: Fault Tolerant Contact Map Reconstruction (FT-COMAR) is a heuristic algorithm for the reconstruction of the protein three-dimensional structure from (possibly) incomplete (i.e. containing unknown entries) and noisy contact maps. FT-COMAR runs within minutes, allowing its application to a large-scale number of predictions. AVAILABILITY: http://bioinformatics.cs.unibo.it/FT-COMAR

Asunto(s)

Algoritmos , Modelos Químicos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/ultraestructura , Programas Informáticos , Sitios de Unión , Simulación por Computador , Unión Proteica , Conformación Proteica , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA