Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 37(2): 321-33, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26353244

RESUMEN

We introduce the four-parameter IBP compound Dirichlet process (ICDP), a stochastic process that generates sparse non-negative vectors with potentially an unbounded number of entries. If we repeatedly sample from the ICDP we can generate sparse matrices with an infinite number of columns and power-law characteristics. We apply the four-parameter ICDP to sparse nonparametric topic modelling to account for the very large number of topics present in large text corpora and the power-law distribution of the vocabulary of natural languages. The model, which we call latent IBP compound Dirichlet allocation (LIDA), allows for power-law distributions, both, in the number of topics summarising the documents and in the number of words defining each topic. It can be interpreted as a sparse variant of the hierarchical Pitman-Yor process when applied to topic modelling. We derive an efficient and simple collapsed Gibbs sampler closely related to the collapsed Gibbs sampler of latent Dirichlet allocation (LDA), making the model applicable in a wide range of domains. Our nonparametric Bayesian topic model compares favourably to the widely used hierarchical Dirichlet process and its heavy tailed version, the hierarchical Pitman-Yor process, on benchmark corpora. Experiments demonstrate that accounting for the power-distribution of real data is beneficial and that sparsity provides more interpretable results.

2.
BMC Bioinformatics ; 10: 365, 2009 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-19878545

RESUMEN

BACKGROUND: Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (DeltaDeltaG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. RESULTS: We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which DeltaDeltaG >or= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. CONCLUSION: We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Alanina/química , Inteligencia Artificial , Sitios de Unión , Bases de Datos de Proteínas , Enlace de Hidrógeno , Proteínas/metabolismo , Electricidad Estática , Termodinámica
3.
Bioinformatics ; 25(10): 1280-6, 2009 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-19279066

RESUMEN

MOTIVATION: Stress response in cells is often mediated by quick activation of transcription factors (TFs). Given the difficulty in experimentally assaying TF activities, several statistical approaches have been proposed to infer them from microarray time courses. However, these approaches often rely on prior assumptions which rule out the rapid responses observed during stress response. RESULTS: We present a novel statistical model to infer how TFs mediate stress response in cells. The model is based on the assumption that sensory TFs quickly transit between active and inactive states. We therefore model mRNA production using a bistable dynamical systems whose behaviour is described by a system of differential equations driven by a latent stochastic process. We assume the stochastic process to be a two-state continuous time jump process, and devise both an exact solution for the inference problem as well as an efficient approximate algorithm. We evaluate the method on both simulated data and real data describing Escherichia coli's response to sudden oxygen starvation. This highlights both the accuracy of the proposed method and its potential for generating novel hypotheses and testable predictions. AVAILABILITY: MATLAB and C++ code used in the article can be downloaded from http://www.dcs.shef.ac.uk/~guido/.


Asunto(s)
Biología Computacional/métodos , Modelos Estadísticos , Estrés Fisiológico/genética , Algoritmos , Escherichia coli/metabolismo , Perfilación de la Expresión Génica , Oxígeno/metabolismo , ARN Mensajero/metabolismo , Factores de Transcripción/metabolismo
4.
Neural Comput ; 21(3): 786-92, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18785854

RESUMEN

The variational approximation of posterior distributions by multivariate gaussians has been much less popular in the machine learning community compared to the corresponding approximation by factorizing distributions. This is for a good reason: the gaussian approximation is in general plagued by an Omicron(N)(2) number of variational parameters to be optimized, N being the number of random variables. In this letter, we discuss the relationship between the Laplace and the variational approximation, and we show that for models with gaussian priors and factorizing likelihoods, the number of variational parameters is actually Omicron(N). The approach is applied to gaussian process regression with nongaussian likelihoods.


Asunto(s)
Inteligencia Artificial , Distribución Normal , Humanos
5.
Neural Netw ; 20(1): 129-38, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17011164

RESUMEN

A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.


Asunto(s)
Teorema de Bayes , Análisis por Conglomerados , Robótica , Algoritmos , Humanos , Aprendizaje , Distribución Normal
6.
Artif Intell Med ; 32(3): 183-94, 2004 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-15531150

RESUMEN

Within the framework of the OPTIVIP project, an optic nerve based visual prosthesis is developed in order to restore partial vision to the blind. One of the main challenges is to understand, decode and model the physiological process linking the stimulating parameters to the visual sensations produced in the visual field of a blind volunteer. We propose to use adaptive neural techniques. Two prediction models are investigated. The first one is a grey-box model exploiting the neurophysiological knowledge available up to now. It combines a neurophysiological model with artificial neural networks, such as multi-layer perceptrons and radial basis function networks, in order to predict the features of the visual perceptions. The second model is entirely of the black-box type. We show that both models provide satisfactory prediction tools and achieve similar prediction accuracies. Moreover, we demonstrate that significant improvement (25%) was gained with respect to linear statistical methods, suggesting that the biological process is strongly non-linear.


Asunto(s)
Ceguera/rehabilitación , Modelos Psicológicos , Redes Neurales de la Computación , Prótesis e Implantes , Percepción Visual , Predicción , Humanos , Nervio Óptico/patología , Nervio Óptico/fisiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA