Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-31024904

RESUMEN

Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly "retrofitted" mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity.

2.
Bioinformatics ; 29(6): 797-8, 2013 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-23361326

RESUMEN

MOTIVATION: BLAST remains one of the most widely used tools in computational biology. The rate at which new sequence data is available continues to grow exponentially, driving the emergence of new fields of biological research. At the same time, multicore systems and conventional clusters are more accessible. ScalaBLAST has been designed to run on conventional multiprocessor systems with an eye to extreme parallelism, enabling parallel BLAST calculations using >16 000 processing cores with a portable, robust, fault-resilient design that introduces little to no overhead with respect to serial BLAST.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos
3.
J Virol ; 85(19): 10154-66, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21795337

RESUMEN

The dengue viruses (DENVs) exist as numerous genetic strains that are grouped into four antigenically distinct serotypes. DENV strains from each serotype can cause severe disease and threaten public health in tropical and subtropical regions worldwide. No licensed antiviral agent to treat DENV infections is currently available, and there is an acute need for the development of novel therapeutics. We found that a synthetic small interfering RNA (siRNA) (DC-3) targeting the highly conserved 5' cyclization sequence (5'CS) region of the DENV genome reduced, by more than 100-fold, the titers of representative strains from each DENV serotype in vitro. To determine if DC-3 siRNA could inhibit DENV in vivo, an "in vivo-ready" version of DC-3 was synthesized and tested against DENV-2 by using a mouse model of antibody-dependent enhancement of infection (ADE)-induced disease. Compared with the rapid weight loss and 5-day average survival time of the control groups, mice receiving the DC-3 siRNA had an average survival time of 15 days and showed little weight loss for approximately 12 days. DC-3-treated mice also contained significantly less virus than control groups in several tissues at various time points postinfection. These results suggest that exogenously introduced siRNA combined with the endogenous RNA interference processing machinery has the capacity to prevent severe dengue disease. Overall, the data indicate that DC-3 siRNA represents a useful research reagent and has potential as a novel approach to therapeutic intervention against the genetically diverse dengue viruses.


Asunto(s)
Antivirales/administración & dosificación , Antivirales/farmacología , Virus del Dengue/efectos de los fármacos , Dengue/tratamiento farmacológico , ARN Interferente Pequeño/administración & dosificación , ARN Interferente Pequeño/farmacología , Animales , Acrecentamiento Dependiente de Anticuerpo , Productos Biológicos/administración & dosificación , Productos Biológicos/farmacología , Peso Corporal , Técnicas de Cultivo de Célula , Chlorocebus aethiops , Secuencia Conservada , Dengue/patología , Dengue/virología , Virus del Dengue/genética , Modelos Animales de Enfermedad , Humanos , Ratones , ARN Interferente Pequeño/genética , Enfermedades de los Roedores/tratamiento farmacológico , Enfermedades de los Roedores/patología , Enfermedades de los Roedores/virología , Análisis de Supervivencia
4.
Mol Biosyst ; 7(8): 2407-18, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21698331

RESUMEN

Systems biology attempts to reconcile large amounts of disparate data with existing knowledge to provide models of functioning biological systems. The cyanobacterium Cyanothece sp. ATCC 51142 is an excellent candidate for such systems biology studies because: (i) it displays tight functional regulation between photosynthesis and nitrogen fixation; (ii) it has robust cyclic patterns at the genetic, protein and metabolomic levels; and (iii) it has potential applications for bioenergy production and carbon sequestration. We have represented the transcriptomic data from Cyanothece 51142 under diurnal light/dark cycles as a high-level functional abstraction and describe development of a predictive in silico model of diurnal and circadian behavior in terms of regulatory and metabolic processes in this organism. We show that incorporating network topology into the model improves performance in terms of our ability to explain the behavior of the system under new conditions. The model presented robustly describes transcriptomic behavior of Cyanothece 51142 under different cyclic and non-cyclic growth conditions, and represents a significant advance in the understanding of gene regulation in this important organism.


Asunto(s)
Cyanothece/genética , Modelos Genéticos , Transcripción Genética , Línea Celular , Análisis por Conglomerados , Simulación por Computador , Cyanothece/metabolismo , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Regulación Bacteriana de la Expresión Génica , Redes Reguladoras de Genes , Nitrogenasa/genética , Nitrogenasa/metabolismo , Reproducibilidad de los Resultados , Ribulosa-Bifosfato Carboxilasa/genética , Ribulosa-Bifosfato Carboxilasa/metabolismo , Biología de Sistemas/métodos
5.
Infect Immun ; 79(1): 23-32, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20974833

RESUMEN

In this review, we provide an overview of the methods employed in four recent studies that described novel methods for computational prediction of secreted effectors from type III and IV secretion systems in Gram-negative bacteria. We present the results of these studies in terms of performance at accurately predicting secreted effectors and similarities found between secretion signals that may reflect biologically relevant features for recognition. We discuss the Web-based tools for secreted effector prediction described in these studies and announce the availability of our tool, the SIEVE server (http://www.sysbep.org/sieve). Finally, we assess the accuracies of the three type III effector prediction methods on a small set of proteins not known prior to the development of these tools that we recently discovered and validated using both experimental and computational approaches. Our comparison shows that all methods use similar approaches and, in general, arrive at similar conclusions. We discuss the possibility of an order-dependent motif in the secretion signal, which was a point of disagreement in the studies. Our results show that there may be classes of effectors in which the signal has a loosely defined motif and others in which secretion is dependent only on compositional biases. Computational prediction of secreted effectors from protein sequences represents an important step toward better understanding the interaction between pathogens and hosts.


Asunto(s)
Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Bacterias Gramnegativas/metabolismo , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Bases de Datos de Proteínas , Regulación Bacteriana de la Expresión Génica/fisiología , Bacterias Gramnegativas/genética
6.
Bioinformatics ; 26(13): 1677-83, 2010 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-20568665

RESUMEN

MOTIVATION: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS: We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of approximately 0.83 with an SD of <0.038. Furthermore, we demonstrate that these results are achievable with a small set of 13 variables and can achieve high proteome coverage. AVAILABILITY: http://omics.pnl.gov/software/STEPP.php CONTACT: bj@pnl.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos/aislamiento & purificación , Proteómica/métodos , Espectrometría de Masas , Péptidos/química , Salmonella typhimurium/química , Shewanella/química , Yersinia pestis/química
7.
BMC Bioinformatics ; 11: 145, 2010 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-20302613

RESUMEN

BACKGROUND: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. RESULTS: We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. CONCLUSIONS: A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Homología de Secuencia de Aminoácido , Bases de Datos de Proteínas , Reconocimiento de Normas Patrones Automatizadas
8.
Comput Biol Chem ; 32(6): 458-61, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18722814

RESUMEN

Due to the exponential growth of sequenced genomes, the need to quickly provide accurate annotation for existing and new sequences is paramount to facilitate biological research. Current sequence comparison approaches fail to detect homologous relationships when sequence similarity is low. Support vector machine (SVM) algorithms approach this problem by transforming all proteins into a feature space of equal dimension based on protein properties, such as sequence similarity scores against a basis set of proteins or motifs. This multivariate representation of the protein space is then used to build a classifier specific to a pre-defined protein family. However, this approach is not well suited to large-scale annotation. We have developed a SVM approach that formulates remote homology as a single classifier that answers the pairwise comparison problem by integrating the two feature vectors for a pair of sequences into a single vector representation that can be used to build a classifier that separates sequence pairs into homologs and non-homologs. This pairwise SVM approach significantly improves the task of remote homology detection on the benchmark dataset, quantified as the area under the receiver operating characteristic curve; 0.97 versus 0.73 and 0.70 for PSI-BLAST and Basic Local Alignment Search Tool (BLAST), respectively.


Asunto(s)
Algoritmos , Homología de Secuencia de Aminoácido , Bases de Datos de Proteínas
9.
Bioinformatics ; 24(13): 1503-9, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18453551

RESUMEN

MOTIVATION: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS: We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of 0.8 with a SD of <0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. AVAILABILITY: http://omics.pnl.gov/software/STEPP.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Inteligencia Artificial , Proteínas Bacterianas/química , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo Peptídico/métodos , Péptidos/química , Proteoma/química , Simulación por Computador , Modelos Químicos , Proteómica/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
10.
Bioinformatics ; 24(6): 783-90, 2008 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-18245127

RESUMEN

MOTIVATION: As the amount of biological sequence data continues to grow exponentially we face the increasing challenge of assigning function to this enormous molecular 'parts list'. The most popular approaches to this challenge make use of the simplifying assumption that similar functional molecules, or proteins, sometimes have similar composition, or sequence. However, these algorithms often fail to identify remote homologs (proteins with similar function but dissimilar sequence) which often are a significant fraction of the total homolog collection for a given sequence. We introduce a Support Vector Machine (SVM)-based tool to detect homology using semi-supervised iterative learning (SVM-HUSTLE) that identifies significantly more remote homologs than current state-of-the-art sequence or cluster-based methods. As opposed to building profiles or position specific scoring matrices, SVM-HUSTLE builds an SVM classifier for a query sequence by training on a collection of representative high-confidence training sets, recruits additional sequences and assigns a statistical measure of homology between a pair of sequences. SVM-HUSTLE combines principles of semi-supervised learning theory with statistical sampling to create many concurrent classifiers to iteratively detect and refine, on-the-fly, patterns indicating homology. RESULTS: When compared against existing methods for identifying protein homologs (BLAST, PSI-BLAST, COMPASS, PROF_SIM, RANKPROP and their variants) on two different benchmark datasets SVM-HUSTLE significantly outperforms each of the above methods using the most stringent ROC(1) statistic with P-values less than 1e-20. SVM-HUSTLE also yields results comparable to HHSearch but at a substantially reduced computational cost since we do not require the construction of HMMs. AVAILABILITY: The software executable to run SVM-HUSTLE can be downloaded from http://www.sysbio.org/sysbio/networkbio/svm_hustle


Asunto(s)
Algoritmos , Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Programas Informáticos
11.
Bioinformatics ; 23(13): 1705-7, 2007 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-17483503

RESUMEN

UNLABELLED: The visual Platform for Proteomics Peptide and Protein data exploration (PQuad) is a multi-resolution environment that visually integrates genomic and proteomic data for prokaryotic systems, overlays categorical annotation and compares differential expression experiments. PQuad requires Java 1.5 and has been tested to run across different operating systems. AVAILABILITY: http://ncrr.pnl.gov/software.


Asunto(s)
Algoritmos , Fenómenos Fisiológicos Bacterianos , Gráficos por Computador , Perfilación de la Expresión Génica/métodos , Proteoma/fisiología , Programas Informáticos , Interfaz Usuario-Computador , Integración de Sistemas
12.
Comput Biol Chem ; 31(2): 138-42, 2007 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17416337

RESUMEN

A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value<0.006) than using either sequence similarity or subcellular location independently.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Espacio Intracelular/metabolismo , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Homología Estructural de Proteína , Algoritmos , Células Eucariotas , Modelos Biológicos , Proteínas/metabolismo , Alineación de Secuencia
13.
Comput Biol Chem ; 29(6): 440-3, 2005 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-16290168

RESUMEN

Biopolymer sequence comparison to identify evolutionarily related proteins, or homologs, is one of the most common tasks in bioinformatics. Support vector machines (SVMs) represent a new approach to the problem in which statistical learning theory is employed to classify proteins into families, thus identifying homologous relationships. Current SVM approaches have been shown to outperform iterative profile methods, such as PSI-BLAST, for protein homology classification. In this study, we demonstrate that the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation over a benchmark dataset consisting of 54 unique protein families. The SVM-BALSA algorithms returns a higher area under the receiver operating characteristic (ROC) curves for 37 of the 54 families and achieves an improved overall performance curve at a significance level of 0.07.


Asunto(s)
Alineación de Secuencia , Homología de Secuencia de Aminoácido , Teorema de Bayes , Proteínas/química
14.
J Proteome Res ; 4(5): 1687-98, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16212422

RESUMEN

We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis H(0), that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis H(A), that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both H(A) and H(0) are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support-vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach.


Asunto(s)
Proteoma , Proteómica/métodos , Bases de Datos de Proteínas , Deinococcus/metabolismo , Funciones de Verosimilitud , Espectrometría de Masas , Modelos Estadísticos , Péptidos/química , Probabilidad , Curva ROC
15.
J Cardiovasc Electrophysiol ; 13(11): 1131-40, 2002 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-12475105

RESUMEN

INTRODUCTION: A rapidly activating delayed rectifier potassium current (I(Kr)) is known to have an important role in determining the properties of spontaneous pacing in enzymatically isolated rabbit sinoatrial node (SAN) cells. The functional characteristics of I(Kr) are conferred by its dependence on time, voltage, and external potassium. The aim of this study was to develop a rigorous mathematical representation for I(Kr) based on experimental findings and to investigate the role of I(Kr) in the automaticity and intercellular communication of SAN cells. METHODS AND RESULTS: A Markov model was developed using available experimental data for I(Kr) in rabbit SAN. The dependence of I(Kr) on external potassium, [K+]o, was incorporated using data from both in vitro preparations and results from heterologous expression experiments for this ether-a-go-go related gene product. Our simulation results show the following. (1) I(Kr) is the dominant repolarizing current in rabbit SAN cells. (2) Deactivation of I(Kr) contributes to the net current change during the early diastolic depolarization phase. (3) Inward rectification of I(Kr) results in a decrease in membrane resistance during repolarization relative to plateau. (4) The complex [K+]o dependence of I(Kr) confers [K+]o insensitivity on isolated cells, which may account for the sensitivity of pacing rate to elevated [K+]o at the tissue level. CONCLUSION: Model results show that I(Kr) mediates diastolic depolarization by the kinetics of its decay and by lowering resistance during late repolarization. In elevated [K+]o, increased chord conductance is balanced by the changes in kinetics and voltage dependence of I(Kr) so that the pacing rate of single cells may be more [K+]o insensitive than expected. In addition, elevated [K+]o increases I(Kr) magnitude during repolarization but lowers resistance, so current flow through gap junctions is less able to hyperpolarize pacing cells.


Asunto(s)
Modelos Cardiovasculares , Canales de Potasio con Entrada de Voltaje , Canales de Potasio/fisiología , Nodo Sinoatrial/metabolismo , Potenciales de Acción , Animales , Simulación por Computador , Canales de Potasio de Tipo Rectificador Tardío , Diástole , Electrofisiología , Cadenas de Markov , Técnicas de Placa-Clamp , Conejos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA