Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sensors (Basel) ; 22(11)2022 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-35684856

RESUMEN

An optimal control of the combustion process of an engine ensures lower emissions and fuel consumption plus high efficiencies. Combustion parameters such as the peak firing pressure (PFP) and the crank angle (CA) corresponding to 50% of mass fraction burned (MFB50) are essential for a closed-loop control strategy. These parameters are based on the measured in-cylinder pressure that is typically gained by intrusive pressure sensors (PSs). These are costly and their durability is uncertain. To overcome these issues, the potential of using a virtual sensor based on the vibration signals acquired by a knock sensor (KS) for control of the combustion process is investigated. The present work introduces a data-driven approach where a signal-processing technique, designated as discrete wavelet transform (DWT), will be used as the preprocessing step for extracting informative features to perform regression tasks of the selected combustion parameters with extreme gradient boosting (XGBoost) regression models. The presented methodology will be applied to data from two different spark-ignited, single cylinder gas engines. Finally, an analysis is obtained where the important features based on the model's decisions are identified.

2.
PLoS One ; 17(4): e0267275, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35436321

RESUMEN

INTRODUCTION: The automatic classification of lymphoma lesions in PET is a main topic of ongoing research. An automatic algorithm would enable the swift evaluation of PET parameters, like texture and heterogeneity markers, concerning their prognostic value for patients outcome in large datasets. Moreover, the determination of the metabolic tumor volume would be facilitated. The aim of our study was the development and evaluation of an automatic algorithm for segmentation and classification of lymphoma lesions in PET. METHODS: Pre-treatment PET scans from 60 Hodgkin lymphoma patients from the EuroNet-PHL-C1 trial were evaluated. A watershed algorithm was used for segmentation. For standardization of the scan length, an automatic cropping algorithm was developed. All segmented volumes were manually classified into one of 14 categories. The random forest method and a nested cross-validation was used for automatic classification and evaluation. RESULTS: Overall, 853 volumes were segmented and classified. 203/246 tumor lesions and 554/607 non-tumor volumes were classified correctly by the automatic algorithm, corresponding to a sensitivity, a specificity, a positive and a negative predictive value of 83%, 91%, 79% and 93%. In 44/60 (73%) patients, all tumor lesions were correctly classified. In ten out of the 16 patients with misclassified tumor lesions, only one false-negative tumor lesion occurred. The automatic classification of focal gastrointestinal uptake, brown fat tissue and composed volumes consisting of more than one tissue was challenging. CONCLUSION: Our algorithm, trained on a small number of patients and on PET information only, showed a good performance and is suitable for automatic lymphoma classification.


Asunto(s)
Enfermedad de Hodgkin , Linfoma , Fluorodesoxiglucosa F18/metabolismo , Enfermedad de Hodgkin/diagnóstico por imagen , Enfermedad de Hodgkin/patología , Humanos , Linfoma/diagnóstico por imagen , Tomografía de Emisión de Positrones , Radiofármacos , Carga Tumoral
3.
BMC Bioinformatics ; 20(1): 376, 2019 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-31277571

RESUMEN

BACKGROUND: Molecule identification is a crucial step in metabolomics and environmental sciences. Besides in silico fragmentation, as performed by MetFrag, also machine learning and statistical methods evolved, showing an improvement in molecule annotation based on MS/MS data. In this work we present a new statistical scoring method where annotations of m/z fragment peaks to fragment-structures are learned in a training step. Based on a Bayesian model, two additional scoring terms are integrated into the new MetFrag2.4.5 and evaluated on the test data set of the CASMI 2016 contest. RESULTS: The results on the 87 MS/MS spectra from positive and negative mode show a substantial improvement of the results compared to submissions made by the former MetFrag approach. Top1 rankings increased from 5 to 21 and Top10 rankings from 39 to 55 both showing higher values than for CSI:IOKR, the winner of the CASMI 2016 contest. For the negative mode spectra, MetFrag's statistical scoring outperforms all other participants which submitted results for this type of spectra. CONCLUSIONS: This study shows how statistical learning can improve molecular structure identification based on MS/MS data compared on the same method using combinatorial in silico fragmentation only. MetFrag2.4.5 shows especially in negative mode a better performance compared to the other participating approaches.


Asunto(s)
Metabolómica/métodos , Espectrometría de Masas en Tándem/métodos , Teorema de Bayes , Simulación por Computador , Estructura Molecular
4.
Genome Biol ; 20(1): 9, 2019 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-30630522

RESUMEN

Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the "ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge" in 2017. In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download.


Asunto(s)
Células/metabolismo , Genómica/métodos , Factores de Transcripción/metabolismo , Humanos
5.
BMC Bioinformatics ; 16: 387, 2015 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-26577052

RESUMEN

BACKGROUND: For three decades, sequence logos are the de facto standard for the visualization of sequence motifs in biology and bioinformatics. Reasons for this success story are their simplicity and clarity. The number of inferred and published motifs grows with the number of data sets and motif extraction algorithms. Hence, it becomes more and more important to perceive differences between motifs. However, motif differences are hard to detect from individual sequence logos in case of multiple motifs for one transcription factor, highly similar binding motifs of different transcription factors, or multiple motifs for one protein domain. RESULTS: Here, we present DiffLogo, a freely available, extensible, and user-friendly R package for visualizing motif differences. DiffLogo is capable of showing differences between DNA motifs as well as protein motifs in a pair-wise manner resulting in publication-ready figures. In case of more than two motifs, DiffLogo is capable of visualizing pair-wise differences in a tabular form. Here, the motifs are ordered by similarity, and the difference logos are colored for clarity. We demonstrate the benefit of DiffLogo on CTCF motifs from different human cell lines, on E-box motifs of three basic helix-loop-helix transcription factors as examples for comparison of DNA motifs, and on F-box domains from three different families as example for comparison of protein motifs. CONCLUSIONS: DiffLogo provides an intuitive visualization of motif differences. It enables the illustration and investigation of differences between highly similar motifs such as binding patterns of transcription factors for different cell types, treatments, and algorithmic approaches.


Asunto(s)
Algoritmos , Secuencias de Aminoácidos/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Gráficos por Computador , Motivos de Nucleótidos/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factor de Unión a CCCTC , Biología Computacional/métodos , Humanos , Estructura Terciaria de Proteína , Proteínas Represoras/genética , Células Tumorales Cultivadas
6.
J Neurosci Methods ; 225: 1-12, 2014 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-24457055

RESUMEN

The function of complex networks in the nervous system relies on the proper formation of neuronal contacts and their remodeling. To decipher the molecular mechanisms underlying these processes, it is essential to establish unbiased automated tools allowing the correlation of neurite morphology and the subcellular distribution of molecules by quantitative means. We developed NeuronAnalyzer2D, a plugin for ImageJ, which allows the extraction of neuronal cell morphologies from two dimensional high resolution images, and in particular their correlation with protein profiles determined by indirect immunostaining of primary neurons. The prominent feature of our approach is the ability to extract subcellular distributions of distinct biomolecules along neurites. To extract the complete areas of neurons, required for this analysis, we employ active contours with a new distance based energy. For locating the structural parts of neurons and various morphological parameters we adopt a wavelet based approach. The presented approach is able to extract distinctive profiles of several proteins and reports detailed morphology measurements on neurites. We compare the detected neurons from NeuronAnalyzer2D with those obtained by NeuriteTracer and Vaa3D-Neuron, two popular tools for automatic neurite tracing. The distinctive profiles extracted for several proteins, for example, of the mRNA binding protein ZBP1, and a comparative evaluation of the neuron segmentation results proves the high quality of the quantitative data and proves its practical utility for biomedical analyses.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Modelos Teóricos , Neuronas/citología , Neuronas/metabolismo , Algoritmos , Animales , Humanos , Programas Informáticos
7.
PLoS One ; 9(1): e85629, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24465627

RESUMEN

The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.


Asunto(s)
Algoritmos , Proteínas de Unión al ADN/genética , Modelos Genéticos , Motivos de Nucleótidos/genética , Proteínas Represoras/genética , Secuencia de Bases , Sitios de Unión/genética , Factor de Unión a CCCTC , Línea Celular , Células Cultivadas , Proteínas de Unión al ADN/metabolismo , Células HeLa , Células Hep G2 , Humanos , Células K562 , Células MCF-7 , Cadenas de Markov , Unión Proteica , Proteínas Represoras/metabolismo
8.
Bioinformatics ; 29(22): 2931-2, 2013 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-23995255

RESUMEN

SUMMARY: Transcription activator-like effector nucleases (TALENs) have become an accepted tool for targeted mutagenesis, but undesired off-targets remain an important issue. We present TALENoffer, a novel tool for the genome-wide prediction of TALEN off-targets. We show that TALENoffer successfully predicts known off-targets of engineered TALENs and yields a competitive runtime, scanning complete mammalian genomes within a few minutes. AVAILABILITY: TALENoffer is available as a command line program from http://www.jstacs.de/index.php/TALENoffer and as a Galaxy server at http://galaxy.informatik.uni-halle.de. CONTACT: grau@informatik.uni-halle.de


Asunto(s)
Endodesoxirribonucleasas/metabolismo , Programas Informáticos , Animales , Proteínas de Unión al ADN/metabolismo , Genoma , Modelos Estadísticos , Mutagénesis , Ingeniería de Proteínas
9.
Nucleic Acids Res ; 41(21): e197, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24057214

RESUMEN

De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , ADN/química , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis por Matrices de Proteínas/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Motivos de Nucleótidos , Programas Informáticos
10.
PLoS Comput Biol ; 9(3): e1002962, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23526890

RESUMEN

Transcription activator-like (TAL) effectors are injected into host plant cells by Xanthomonas bacteria to function as transcriptional activators for the benefit of the pathogen. The DNA binding domain of TAL effectors is composed of conserved amino acid repeat structures containing repeat-variable diresidues (RVDs) that determine DNA binding specificity. In this paper, we present TALgetter, a new approach for predicting TAL effector target sites based on a statistical model. In contrast to previous approaches, the parameters of TALgetter are estimated from training data computationally. We demonstrate that TALgetter successfully predicts known TAL effector target sites and often yields a greater number of predictions that are consistent with up-regulation in gene expression microarrays than an existing approach, Target Finder of the TALE-NT suite. We study the binding specificities estimated by TALgetter and approve that different RVDs are differently important for transcriptional activation. In subsequent studies, the predictions of TALgetter indicate a previously unreported positional preference of TAL effector target sites relative to the transcription start site. In addition, several TAL effectors are predicted to bind to the TATA-box, which might constitute one general mode of transcriptional activation by TAL effectors. Scrutinizing the predicted target sites of TALgetter, we propose several novel TAL effector virulence targets in rice and sweet orange. TAL-mediated induction of the candidates is supported by gene expression microarrays. Validity of these targets is also supported by functional analogy to known TAL effector targets, by an over-representation of TAL effector targets with similar function, or by a biological function related to pathogen infection. Hence, these predicted TAL effector virulence targets are promising candidates for studying the virulence function of TAL effectors. TALgetter is implemented as part of the open-source Java library Jstacs, and is freely available as a web-application and a command line program.


Asunto(s)
Proteínas Bacterianas/química , Proteínas de Unión al ADN/química , Regulación de la Expresión Génica de las Plantas , Factores de Transcripción/química , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/microbiología , Unión Proteica , Reproducibilidad de los Resultados , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Xanthomonas/genética , Xanthomonas/patogenicidad
11.
J Bioinform Comput Biol ; 11(1): 1340006, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23427988

RESUMEN

DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.


Asunto(s)
Proteínas de Unión al ADN/genética , ADN/genética , Programas Informáticos , Factores de Transcripción/genética , Secuencias de Aminoácidos , Sitios de Unión , Unión Proteica
12.
Mol Biol Rep ; 39(1): 761-9, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21559832

RESUMEN

Cancer/testis antigens (CTA) are a heterogeneous group of antigens that are expressed preferentially in tumor cells and testis. Based on this definition the human membrane-associated phospholipase A1 beta (lipase family member I, LIPI) has been identified as CTA. The high homology of LIPI and the membrane-associated phospholipase A1 alpha (lipase family member H, LIPH) suggests that both genes are derived from a common ancestor by gene duplication. In contrast to human LIPI, human LIPH is expressed in several tissues. LIPI sequences have only been identified in mammals. Here, we describe the identification of LIPI in non-mammalian vertebrates. Based on the conserved genomic organization of LIPI and LIPH we identified sequences for both lipases in birds and fishes. In all vertebrates the LIPI locus is neighbored by a member of the RNA binding motif (RBM) family, RBM11. By sequencing of reverse transcriptase-polymerase chain reaction products we determined the sequences of LIPI and LIPH messenger RNA from broilers. We found that the sequence homology between LIPI and LIPH is much higher in non-mammalian species than in mammals. In addition, we found broad expression of LIPI in broilers, resembling the expression profile of LIPH. Our data suggest that LIPI is a CTA only in mammalian species and that the unique sequence features of the mammalian LIPI/RBM11 locus have evolved together with the CTA-like expression pattern of LIPI.


Asunto(s)
Antígenos de Neoplasias/genética , Pollos/genética , Isoenzimas/genética , Fosfolipasas A1/genética , Animales , Secuencia de Bases , Cartilla de ADN/genética , Evolución Molecular , Perfilación de la Expresión Génica , Datos de Secuencia Molecular , Proteínas de Unión al ARN/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ADN , Homología de Secuencia , Especificidad de la Especie
13.
PLoS Comput Biol ; 7(2): e1001070, 2011 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-21347314

RESUMEN

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.


Asunto(s)
Factores de Transcripción/metabolismo , Animales , Arabidopsis/efectos de los fármacos , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Sitios de Unión/genética , Biología Computacional , ADN de Plantas/genética , ADN de Plantas/metabolismo , Bases de Datos Genéticas , Genes de Plantas/efectos de los fármacos , Humanos , Ácidos Indolacéticos/farmacología , Modelos Genéticos , Modelos Estadísticos , Regiones Promotoras Genéticas
14.
Methods Mol Biol ; 674: 97-119, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20827588

RESUMEN

Many different computer programs for the prediction of transcription factor binding sites have been developed over the last decades. These programs differ from each other by pursuing different objectives and by taking into account different sources of information. For methods based on statistical approaches, these programs differ at an elementary level from each other by the statistical models used for individual binding sites and flanking sequences and by the learning principles employed for estimating the model parameters. According to our experience, both the models and the learning principles should be chosen with great care, depending on the specific task at hand, but many existing programs do not allow the user to choose them freely. Hence, we developed Jstacs, an object-oriented Java framework for sequence analysis, which allows the user to combine different statistical models and different learning principles in a modular manner with little effort. In this chapter we explain how Jstacs can be used for the recognition of transcription factor binding sites.


Asunto(s)
Biología Computacional/métodos , Factores de Transcripción/metabolismo , Secuencia de Bases , Sitios de Unión , Humanos , Funciones de Verosimilitud , Regiones Promotoras Genéticas/genética , Receptores de Esteroides/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos
15.
BMC Bioinformatics ; 11: 149, 2010 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-20307305

RESUMEN

BACKGROUND: One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. RESULTS: With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the same a-priori information, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. CONCLUSIONS: We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different priors has been neglected. We implement the derived prior is implemented in the open-source library Jstacs to enable an easy application to comparative studies of different learning principles in the field of sequence analysis.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Teorema de Bayes , Sitios de Unión , ADN/química , Cadenas de Markov , Reconocimiento de Normas Patrones Automatizadas/métodos , Alineación de Secuencia/métodos
16.
BMC Bioinformatics ; 11: 98, 2010 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-20175896

RESUMEN

BACKGROUND: The recognition of functional binding sites in genomic DNA remains one of the fundamental challenges of genome research. During the last decades, a plethora of different and well-adapted models has been developed, but only little attention has been payed to the development of different and similarly well-adapted learning principles. Only recently it was noticed that discriminative learning principles can be superior over generative ones in diverse bioinformatics applications, too. RESULTS: Here, we propose a generalization of generative and discriminative learning principles containing the maximum likelihood, maximum a posteriori, maximum conditional likelihood, maximum supervised posterior, generative-discriminative trade-off, and penalized generative-discriminative trade-off learning principles as special cases, and we illustrate its efficacy for the recognition of vertebrate transcription factor binding sites. CONCLUSIONS: We find that the proposed learning principle helps to improve the recognition of transcription factor binding sites, enabling better computational approaches for extracting as much information as possible from valuable wet-lab data. We make all implementations available in the open-source library Jstacs so that this learning principle can be easily applied to other classification problems in the field of genome and epigenome analysis.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Algoritmos , ADN/química , ADN/metabolismo , Análisis Discriminante , Genoma , Genómica , Funciones de Verosimilitud , Reconocimiento de Normas Patrones Automatizadas
17.
Electrophoresis ; 30(23): 4137-48, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19960478

RESUMEN

Proteomics is a rapidly growing field of modern biology. Since quantitative data of proteins involved in dynamic processes of living organisms are essential for understanding the basics of life, techniques like 2-DE and related procedures for automatic data interpretation are at the heart of this research field. They are strongly required to enable analysis and interpretation of the emerging amount of available data. Analyzing and interpreting gel image data usually requires the comparison of gels from different experiments and, thus, a prior registration of gels. This can be accomplished using featureless, feature-based or hybrid registration approaches combining both techniques. Recently, the latter ones have shown high performance, and it is undoubtful that in general robust and reliable features are an essential ingredient and valuable source of information for high-quality image registration. In this paper we provide a thorough overview and elaborate analysis of the capabilities of available feature detectors for gel image registration. Particularly, a detailed and extensive comparative study is presented where common spot-specific detectors are included as well as image-content independent detectors that were not applied to the task of gel image registration until now. The study incorporates tests on several thousand synthetically deformed images from different experimental conditions. As a result it provides valuable quantitative data allowing for direct objective comparisons of various detectors, and is well suited to guide the design of new registration algorithms.


Asunto(s)
Algoritmos , Electroforesis en Gel Bidimensional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Proteómica/métodos , Bases de Datos Factuales , Reproducibilidad de los Resultados
18.
J Bioinform Comput Biol ; 5(2B): 561-77, 2007 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17636862

RESUMEN

Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of cis-regulatory elements, and it has been demonstrated that they outperform traditional models such as position weight matrices, Markov models, and Bayesian trees for the recognition of binding sites in prokaryotes. Here, we study to which degree variable order models can improve the recognition of eukaryotic cis-regulatory elements. We find that variable order models can improve the recognition of binding sites of all the studied transcription factors. To ease a systematic evaluation of different model combinations based on problem-specific data sets and allow genomic scans of cis-regulatory elements based on fixed and variable order Markov models and Bayesian trees, we provide the VOMBATserver to the public community.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Modelos Genéticos , Elementos Reguladores de la Transcripción/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/genética , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos
19.
J Clin Monit Comput ; 21(4): 219-26, 2007 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-17520331

RESUMEN

OBJECTIVE: In intraoperative analysis of electromygraphic signals (EMG) for monitoring purposes, baseline artefacts frequently pose considerable problems. Since artefact sources in the operating room can only be reduced to a limited degree, signal-processing methods are needed to correct the registered data online without major changes to the relevant data itself. We describe a method for baseline correction based on "discrete wavelet transform" (DWT) and evaluate its performance compared to commonly used digital filters. METHODS: EMG data from 10 patients who underwent removal of acoustic neuromas were processed. Effectiveness, preservation of relevant EMG patterns and processing speed of a DWT based correction method was assessed and compared to a range of commonly used Butterworth, Resistor-Capacitor and Gaussian filters. RESULTS: Butterworth and DWT filters showed better performance regarding artefact correction and pattern preservation compared to Resistor-Capacitor and Gaussian filters. Assuming equal weighting of both characteristics, DWT outperformed the other methods: While Butterworth, Resistor-Capacitor and Gaussian provided good pattern preservation, the effectiveness was low and vice versa, while DWT baseline correction at level 6 performed well in both characteristics. CONCLUSIONS: The DWT method allows reliable and efficient intraoperative baseline correction in real-time. It is superior to commonly used methods and may be crucial for intraoperative analysis of EMG data, for example for intraoperative assessment of facial nerve function.


Asunto(s)
Electromiografía/métodos , Monitoreo Intraoperatorio/métodos , Electromiografía/estadística & datos numéricos , Nervio Facial/fisiopatología , Humanos , Monitoreo Intraoperatorio/estadística & datos numéricos , Neuroma Acústico/fisiopatología , Neuroma Acústico/cirugía , Procesamiento de Señales Asistido por Computador
20.
Nucleic Acids Res ; 34(Web Server issue): W529-33, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16845064

RESUMEN

Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of transcription factor binding sites, and it could be demonstrated that they outperform traditional models, such as position weight matrices, Markov models and Bayesian trees. We develop a web server for the recognition of DNA binding sites based on variable order Markov models and variable order Bayesian trees offering the following functionality: (i) given datasets with annotated binding sites and genomic background sequences, variable order Markov models and variable order Bayesian trees can be trained; (ii) given a set of trained models, putative DNA binding sites can be predicted in a given set of genomic sequences and (iii) given a dataset with annotated binding sites and a dataset with genomic background sequences, cross-validation experiments for different model combinations with different parameter settings can be performed. Several of the offered services are computationally demanding, such as genome-wide predictions of DNA binding sites in mammalian genomes or sets of 10(4)-fold cross-validation experiments for different model combinations based on problem-specific data sets. In order to execute these jobs, and in order to serve multiple users at the same time, the web server is attached to a Linux cluster with 150 processors. VOMBAT is available at http://pdw-24.ipk-gatersleben.de:8080/VOMBAT/.


Asunto(s)
Genómica/métodos , Elementos Reguladores de la Transcripción , Programas Informáticos , Factores de Transcripción/metabolismo , Algoritmos , Teorema de Bayes , Sitios de Unión , Internet , Cadenas de Markov , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA