Búsqueda | Portal Regional de la BVS

Self-contained sequence representation: bridging the gap between bioinformatics and cheminformatics.

Chen, William L; Leland, Burton A; Durant, Joseph L; Grier, David L; Christie, Bradley D; Nourse, James G; Taylor, Keith T.

J Chem Inf Model ; 51(9): 2186-208, 2011 Sep 26.

Artículo en Inglés | MEDLINE | ID: mdl-21800899

RESUMEN

The wide application of next-generation sequencing has presented a new hurdle to bioinformatics for managing the fast-growing sequence data. The management of biomacromolecules at the chemistry level imposes an even greater challenge in cheminformatics because of the lack of a good chemical representation of biopolymers. Here we introduce the self-contained sequence representation (SCSR). SCSR combines the best features of bioinformatics and cheminformatics notations. SCSR is the first general, extensible, and comprehensive representation of biopolymers in a compressed format that retains chemistry detail. The SCSR-based high-performance exact structure and substructure searching methods (NEMA key and SSS) offer new ways to search biopolymers that complement bioinformatics approaches. The widely used chemical structure file format (molfile) has been enhanced to support SCSR. SCSR offers a solid framework for future development of new methods and systems for managing and handling sequences at the chemistry level. SCSR lays the foundation for the integration of bioinformatics and cheminformatics.

Asunto(s)

Biología Computacional , Biopolímeros/química

VET: a tool for reaction plausibility checking.

Durant, Joseph L; Leland, Burton A; Nourse, James G.

J Chem Inf Model ; 46(2): 762-6, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-16563007

RESUMEN

Production of chemical reaction databases is a multistep process, with the possibility of errors at each of these steps. VET is a tool developed to trap errors in the chemical reactions identified as a part of this process. VET has been designed to minimize the acceptance of incorrect reactions, while still supporting various common practices in reaction depiction, including unbalanced reactions, suppressed components, and reactions with alternative products. We discuss the assumptions made in its construction, a general overview of its structure, and some performance characteristics.

Reoptimization of MDL keys for use in drug discovery.

Durant, Joseph L; Leland, Burton A; Henry, Douglas R; Nourse, James G.

J Chem Inf Comput Sci ; 42(6): 1273-80, 2002.

Artículo en Inglés | MEDLINE | ID: mdl-12444722

RESUMEN

For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.

Asunto(s)

Biología Computacional/métodos , Evaluación Preclínica de Medicamentos/métodos , Algoritmos , Bases de Datos Factuales , Genética , Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos , Relación Estructura-Actividad

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA