Búsqueda | Portal Regional de la BVS

1.

Efficient enumeration and visualization of helix-coil ensembles.

Hughes, Roy G; Zhao, Shiwen; Oas, Terrence G; Schmidler, Scott C.

Biophys J ; 123(3): 317-333, 2024 Feb 06.

Artículo en Inglés | MEDLINE | ID: mdl-38158653

RESUMEN

Helix-coil models are routinely used to interpret circular dichroism data of helical peptides or predict the helicity of naturally-occurring and designed polypeptides. However, a helix-coil model contains significantly more information than mean helicity alone, as it defines the entire ensemble-the equilibrium population of every possible helix-coil configuration-for a given sequence. Many desirable quantities of this ensemble are either not obtained as ensemble averages or are not available using standard helicity-averaging calculations. Enumeration of the entire ensemble can allow calculation of a wider set of ensemble properties, but the exponential size of the configuration space typically renders this intractable. We present an algorithm that efficiently approximates the helix-coil ensemble to arbitrary accuracy by sequentially generating a list of the M highest populated configurations in descending order of population. Truncating this list of (configuration, population) pairs at a desired accuracy provides an approximating sub-ensemble. We demonstrate several uses of this approach for providing insight into helix-coil ensembles and folding mechanisms, including landscape visualization.

Asunto(s)

Péptidos , Péptidos/química , Dicroismo Circular

2.

Computing the inducibility of B cell lineages under a context-dependent model of affinity maturation: Applications to sequential vaccine design.

Mathews, Joseph; Van Itallie, Elizabeth; Wiehe, Kevin; Schmidler, Scott C.

bioRxiv ; 2023 Oct 17.

Artículo en Inglés | MEDLINE | ID: mdl-37905016

RESUMEN

A key challenge in B cell lineage-based vaccine design is understanding the inducibility of target neutralizing antibodies. We approach this problem through the use of detailed stochastic modeling of the somatic hypermutation process that occurs during affinity maturation. Under such a model, sequence mutation rates are context-dependent, rendering standard probability calculations for sequence evolution intractable. We develop an algorithmic approach to rapid, accurate approximation of key marginal sequence likelihoods required to inform modern sequential vaccine design strategies. These calculated probabilities are used to define an inducibility index for selecting among potential targets for immunogen design. We apply this approach to the problem of choosing targets for the design of boosting immunogens aimed at elicitation of the HIV broadly-neutralizing antibody DH270min11.

3.

Efficient Enumeration and Visualization of Helix-coil Ensembles.

Schmidler, Scott C; Hughes, Roy Gene; Oas, Terrence G; Zhao, Shiwen.

bioRxiv ; 2023 Sep 17.

Artículo en Inglés | MEDLINE | ID: mdl-37745350

RESUMEN

Helix-coil models are routinely used to interpret CD data of helical peptides or predict the helicity of naturally-occurring and designed polypeptides. However, a helix-coil model contains significantly more information than mean helicity alone, as it defines the entire ensemble - the equilibrium population of every possible helix-coil configuration - for a given sequence. Many desirable quantities of this ensemble are either not obtained as ensemble averages, or are not available using standard helicity-averaging calculations. Enumeration of the entire ensemble can allow calculation of a wider set of ensemble properties, but the exponential size of the configuration space typically renders this intractable. We present an algorithm that efficiently approximates the helix-coil ensemble to arbitrary accuracy, by sequentially generating a list of the M highest populated configurations in descending order of population. Truncating this list of (configuration, population) pairs at a desired accuracy provides an approximating sub-ensemble. We demonstrate several uses of this approach for providing insight into helix-coil ensembles and folding mechanisms, including landscape visualization.

4.

A Bayesian non-parametric mixed-effects model of microbial growth curves.

Tonner, Peter D; Darnell, Cynthia L; Bushell, Francesca M L; Lund, Peter A; Schmid, Amy K; Schmidler, Scott C.

PLoS Comput Biol ; 16(10): e1008366, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-33104703

RESUMEN

Substantive changes in gene expression, metabolism, and the proteome are manifested in overall changes in microbial population growth. Quantifying how microbes grow is therefore fundamental to areas such as genetics, bioengineering, and food safety. Traditional parametric growth curve models capture the population growth behavior through a set of summarizing parameters. However, estimation of these parameters from data is confounded by random effects such as experimental variability, batch effects or differences in experimental material. A systematic statistical method to identify and correct for such confounding effects in population growth data is not currently available. Further, our previous work has demonstrated that parametric models are insufficient to explain and predict microbial response under non-standard growth conditions. Here we develop a hierarchical Bayesian non-parametric model of population growth that identifies the latent growth behavior and response to perturbation, while simultaneously correcting for random effects in the data. This model enables more accurate estimates of the biological effect of interest, while better accounting for the uncertainty due to technical variation. Additionally, modeling hierarchical variation provides estimates of the relative impact of various confounding effects on measured population growth.

Asunto(s)

Bacterias/crecimiento & desarrollo , Modelos Biológicos , Biología de Sistemas/métodos , Bacterias/metabolismo , Teorema de Bayes , Estadísticas no Paramétricas

5.

Incorporating Nearest-Neighbor Site Dependence into Protein Evolution Models.

Larson, Gary; Thorne, Jeffrey L; Schmidler, Scott.

J Comput Biol ; 27(3): 361-375, 2020 03.

Artículo en Inglés | MEDLINE | ID: mdl-32053390

RESUMEN

Evolutionary models of proteins are widely used for statistical sequence alignment and inference of homology and phylogeny. However, the vast majority of these models rely on an unrealistic assumption of independent evolution between sites. Here we focus on the related problem of protein structure alignment, a classic tool of computational biology that is widely used to identify structural and functional similarity and to infer homology among proteins. A site-independent statistical model for protein structural evolution has previously been introduced and shown to significantly improve alignments and phylogenetic inferences compared with approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions. The result is a spatiotemporal model of protein structure evolution, described by a multivariate diffusion process convolved with a spatial birth-death process. This extended site-dependent model (SDM) comes with little additional computational cost or analytical complexity compared with the site-independent model (SIM). We demonstrate that this SDM yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction. We also develop a simple model of site-dependent sequence evolution, which we use to demonstrate the bias resulting from the application of standard site-independent sequence evolution models.

Asunto(s)

Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Evolución Molecular , Modelos Estadísticos , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología Estructural de Proteína

6.

Systematic Discovery of Archaeal Transcription Factor Functions in Regulatory Networks through Quantitative Phenotyping Analysis.

Darnell, Cynthia L; Tonner, Peter D; Gulli, Jordan G; Schmidler, Scott C; Schmid, Amy K.

mSystems ; 2(5)2017.

Artículo en Inglés | MEDLINE | ID: mdl-28951888

RESUMEN

Gene regulatory networks (GRNs) are critical for dynamic transcriptional responses to environmental stress. However, the mechanisms by which GRN regulation adjusts physiology to enable stress survival remain unclear. Here we investigate the functions of transcription factors (TFs) within the global GRN of the stress-tolerant archaeal microorganism Halobacterium salinarum. We measured growth phenotypes of a panel of TF deletion mutants in high temporal resolution under heat shock, oxidative stress, and low-salinity conditions. To quantitate the noncanonical functional forms of the growth trajectories observed for these mutants, we developed a novel modeling framework based on Gaussian process regression and functional analysis of variance (FANOVA). We employ unique statistical tests to determine the significance of differential growth relative to the growth of the control strain. This analysis recapitulated known TF functions, revealed novel functions, and identified surprising secondary functions for characterized TFs. Strikingly, we observed that the majority of the TFs studied were required for growth under multiple stress conditions, pinpointing regulatory connections between the conditions tested. Correlations between quantitative phenotype trajectories of mutants are predictive of TF-TF connections within the GRN. These phenotypes are strongly concordant with predictions from statistical GRN models inferred from gene expression data alone. With genome-wide and targeted data sets, we provide detailed functional validation of novel TFs required for extreme oxidative stress and heat shock survival. Together, results presented in this study suggest that many TFs function under multiple conditions, thereby revealing high interconnectivity within the GRN and identifying the specific TFs required for communication between networks responding to disparate stressors. IMPORTANCE To ensure survival in the face of stress, microorganisms employ inducible damage repair pathways regulated by extensive and complex gene networks. Many archaea, microorganisms of the third domain of life, persist under extremes of temperature, salinity, and pH and under other conditions. In order to understand the cause-effect relationships between the dynamic function of the stress network and ultimate physiological consequences, this study characterized the physiological role of nearly one-third of all regulatory proteins known as transcription factors (TFs) in an archaeal organism. Using a unique quantitative phenotyping approach, we discovered functions for many novel TFs and revealed important secondary functions for known TFs. Surprisingly, many TFs are required for resisting multiple stressors, suggesting cross-regulation of stress responses. Through extensive validation experiments, we map the physiological roles of these novel TFs in stress response back to their position in the regulatory network wiring. This study advances understanding of the mechanisms underlying how microorganisms resist extreme stress. Given the generality of the methods employed, we expect that this study will enable future studies on how regulatory networks adjust cellular physiology in a diversity of organisms.

7.

Drivers of Inter-individual Variation in Dengue Viral Load Dynamics.

Ben-Shachar, Rotem; Schmidler, Scott; Koelle, Katia.

PLoS Comput Biol ; 12(11): e1005194, 2016 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-27855153

RESUMEN

Dengue is a vector-borne viral disease of humans that endemically circulates in many tropical and subtropical regions worldwide. Infection with dengue can result in a range of disease outcomes. A considerable amount of research has sought to improve our understanding of this variation in disease outcomes and to identify predictors of severe disease. Contributing to this research, patterns of viral load in dengue infected patients have been quantified, with analyses indicating that peak viral load levels, rates of viral load decline, and time to peak viremia are useful predictors of severe disease. Here, we take a complementary approach to understanding patterns of clinical manifestation and inter-individual variation in viral load dynamics. Specifically, we statistically fit mathematical within-host models of dengue to individual-level viral load data to test virological and immunological hypotheses explaining inter-individual variation in dengue viral load. We choose between alternative models using model selection criteria to determine which hypotheses are best supported by the data. We first show that the cellular immune response plays an important role in regulating viral load in secondary dengue infections. We then provide statistical support for the process of antibody-dependent enhancement (but not original antigenic sin) in the development of severe disease in secondary dengue infections. Finally, we show statistical support for serotype-specific differences in viral infectivity rates, with infectivity rates of dengue serotypes 2 and 3 exceeding those of serotype 1. These results contribute to our understanding of dengue viral load patterns and their relationship to the development of severe dengue disease. They further have implications for understanding how dengue transmissibility may depend on the immune status of infected individuals and the identity of the infecting serotype.

Asunto(s)

Virus del Dengue/aislamiento & purificación , Virus del Dengue/fisiología , Dengue/epidemiología , Dengue/virología , Modelos Estadísticos , Carga Viral/estadística & datos numéricos , Adolescente , Adulto , Simulación por Computador , Dengue/diagnóstico , Virus del Dengue/clasificación , Femenino , Humanos , Masculino , Persona de Mediana Edad , Prevalencia , Reproducibilidad de los Resultados , Factores de Riesgo , Sensibilidad y Especificidad , Especificidad de la Especie , Vietnam/epidemiología , Adulto Joven

8.

Tree Topology Estimation.

Estrada, Rolando; Tomasi, Carlo; Schmidler, Scott C; Farsiu, Sina.

IEEE Trans Pattern Anal Mach Intell ; 37(8): 1688-701, 2015 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-26353004

RESUMEN

Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree - what connects to what - from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph - which edge endpoint is closer to the root of the tree - but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree data sets show that our methodology is both accurate and efficient.

Asunto(s)

Inteligencia Artificial , Imagenología Tridimensional/métodos , Algoritmos , Bases de Datos Factuales , Humanos , Relámpago , Vasos Retinianos/anatomía & histología , Procesos Estocásticos , Árboles

9.

Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure.

Herman, Joseph L; Challis, Christopher J; Novák, Ádám; Hein, Jotun; Schmidler, Scott C.

Mol Biol Evol ; 31(9): 2251-66, 2014 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-24899668

RESUMEN

For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence-structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set.

Asunto(s)

Teorema de Bayes , Biología Computacional/métodos , Globinas/química , Hemoglobinas/química , Mioglobina/química , Animales , Citoglobina , Globinas/genética , Hemoglobinas/genética , Humanos , Cadenas de Markov , Modelos Moleculares , Mutación , Mioglobina/genética , Filogenia , Conformación Proteica , Alineación de Secuencia , Análisis de Secuencia de Proteína

10.

Ligand concentration regulates the pathways of coupled protein folding and binding.

Daniels, Kyle G; Tonthat, Nam K; McClure, David R; Chang, Yu-Chu; Liu, Xin; Schumacher, Maria A; Fierke, Carol A; Schmidler, Scott C; Oas, Terrence G.

J Am Chem Soc ; 136(3): 822-5, 2014 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-24364358

RESUMEN

Coupled ligand binding and conformational change plays a central role in biological regulation. Ligands often regulate protein function by modulating conformational dynamics, yet the order in which binding and conformational change occurs are often hotly debated. Here we show that the "conformational selection versus induced fit" distinction on which this debate is based is a false dichotomy because the mechanism depends on ligand concentration. Using the binding of pyrophosphate (PPi) to Bacillus subtilis RNase P protein as a model, we show that coupled reactions are best understood as a change in flux between competing pathways with distinct orders of binding and conformational change. The degree of partitioning through each pathway depends strongly on PPi concentration, with ligand binding redistributing the conformational ensemble toward the folded state by both increasing folding rates and decreasing unfolding rates. These results indicate that ligand binding induces marked and varied changes in protein conformational dynamics, and that the order of binding and conformational change is ligand concentration dependent.

Asunto(s)

Difosfatos/metabolismo , Pliegue de Proteína , Ribonucleasa P/química , Ribonucleasa P/metabolismo , Sustitución de Aminoácidos , Bacillus subtilis/enzimología , Ligandos , Modelos Moleculares , Unión Proteica , Conformación Proteica , Ribonucleasa P/genética

11.

BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

Rodriguez, Abel; Schmidler, Scott C.

Ann Appl Stat ; 8(4): 2068-2095, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-26925188

RESUMEN

The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

12.

A stochastic evolutionary model for protein structure alignment and phylogeny.

Challis, Christopher J; Schmidler, Scott C.

Mol Biol Evol ; 29(11): 3575-87, 2012 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-22723302

RESUMEN

We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein-Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution.

Asunto(s)

Evolución Molecular , Modelos Genéticos , Filogenia , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Animales , Simulación por Computador , Variación Genética , Hemoglobinas/química , Hemoglobinas/genética , Humanos , Ficocianina/química , Ficocianina/genética , Rhodophyta/química , Alineación de Secuencia , Procesos Estocásticos

13.

BAYESIAN MODEL SEARCH AND MULTILEVEL INFERENCE FOR SNP ASSOCIATION STUDIES.

Wilson, Melanie A; Iversen, Edwin S; Clyde, Merlise A; Schmidler, Scott C; Schildkraut, Joellen M.

Ann Appl Stat ; 4(3): 1342-1364, 2010 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-21179394

RESUMEN

Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.

14.

Preserving the Boltzmann ensemble in replica-exchange molecular dynamics.

Cooke, Ben; Schmidler, Scott C.

J Chem Phys ; 129(16): 164112, 2008 Oct 28.

Artículo en Inglés | MEDLINE | ID: mdl-19045252

RESUMEN

We consider the convergence behavior of replica-exchange molecular dynamics (REMD) [Sugita and Okamoto, Chem. Phys. Lett. 314, 141 (1999)] based on properties of the numerical integrators in the underlying isothermal molecular dynamics (MD) simulations. We show that a variety of deterministic algorithms favored by molecular dynamics practitioners for constant-temperature simulation of biomolecules fail either to be measure invariant or irreducible, and are therefore not ergodic. We then show that REMD using these algorithms also fails to be ergodic. As a result, the entire configuration space may not be explored even in an infinitely long simulation, and the simulation may not converge to the desired equilibrium Boltzmann ensemble. Moreover, our analysis shows that for initial configurations with unfavorable energy, it may be impossible for the system to reach a region surrounding the minimum energy configuration. We demonstrate these failures of REMD algorithms for three small systems: a Gaussian distribution (simple harmonic oscillator dynamics), a bimodal mixture of Gaussians distribution, and the alanine dipeptide. Examination of the resulting phase plots and equilibrium configuration densities indicates significant errors in the ensemble generated by REMD simulation. We describe a simple modification to address these failures based on a stochastic hybrid Monte Carlo correction, and prove that this is ergodic.

Asunto(s)

Algoritmos , Alanina/química , Dipéptidos/química , Cadenas de Markov , Modelos Moleculares , Método de Montecarlo , Conformación Proteica , Temperatura , Termodinámica

15.

Intergenic and genic sequence lengths have opposite relationships with respect to gene expression.

Colinas, Juliette; Schmidler, Scott C; Bohrer, Gil; Iordanov, Borislav; Benfey, Philip N.

PLoS One ; 3(11): e3670, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18989364

RESUMEN

Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.

Asunto(s)

Arabidopsis/genética , ADN Intergénico/genética , Regulación de la Expresión Génica de las Plantas , Epigénesis Genética , Perfilación de la Expresión Génica , Variación Genética , Genoma de Planta , ARN Mensajero/genética

16.

Statistical prediction and molecular dynamics simulation.

Cooke, Ben; Schmidler, Scott C.

Biophys J ; 95(10): 4497-511, 2008 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-18676654

RESUMEN

We describe a statistical approach to the validation and improvement of molecular dynamics simulations of macromolecules. We emphasize the use of molecular dynamics simulations to calculate thermodynamic quantities that may be compared to experimental measurements, and the use of a common set of energetic parameters across multiple distinct molecules. We briefly review relevant results from the theory of stochastic processes and discuss the monitoring of convergence to equilibrium, the obtaining of confidence intervals for summary statistics corresponding to measured quantities, and an approach to validation and improvement of simulations based on out-of-sample prediction. We apply these methods to replica exchange molecular dynamics simulations of a set of eight helical peptides under the AMBER potential using implicit solvent. We evaluate the ability of these simulations to quantitatively reproduce experimental helicity measurements obtained by circular dichroism. In addition, we introduce notions of statistical predictive estimation for force-field parameter refinement. We perform a sensitivity analysis to identify key parameters of the potential, and introduce Bayesian updating of these parameters. We demonstrate the effect of parameter updating applied to the internal dielectric constant parameter on the out-of-sample prediction accuracy as measured by cross-validation.

Asunto(s)

Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestructura , Simulación por Computador , Modelos Estadísticos , Conformación Proteica , Pliegue de Proteína

17.

Hydration and conformational mechanics of single, end-tethered elastin-like polypeptides.

Valiaev, Alexei; Lim, Dong Woo; Schmidler, Scott; Clark, Robert L; Chilkoti, Ashutosh; Zauscher, Stefan.

J Am Chem Soc ; 130(33): 10939-46, 2008 Aug 20.

Artículo en Inglés | MEDLINE | ID: mdl-18646848

RESUMEN

We investigated the effect of temperature, ionic strength, solvent polarity, and type of guest residue on the force-extension behavior of single, end-tethered elastin-like polypeptides (ELPs), using single molecule force spectroscopy (SMFS). ELPs are stimulus-responsive polypeptides that contain repeats of the five amino acids Val-Pro-Gly-Xaa-Gly (VPGXG), where Xaa is a guest residue that can be any amino acid with the exception of proline. We fitted the force-extension data with a freely jointed chain (FJC) model which allowed us to resolve small differences in the effective Kuhn segment length distributions that largely arise from differences in the hydrophobic hydration behavior of ELP. Our results agree qualitatively with predictions from recent molecular dynamics simulations and demonstrate that hydrophobic hydration modulates the molecular elasticity for ELPs. Furthermore, our results show that SMFS, when combined with our approach for data analysis, can be used to study the subtleties of polypeptide-water interactions and thus provides a basis for the study of hydrophobic hydration in intrinsically unstructured biomacromolecules.

Asunto(s)

Péptidos/química , Aminoácidos/química , Simulación por Computador , Elasticidad , Interacciones Hidrofóbicas e Hidrofílicas , Microscopía de Fuerza Atómica , Modelos Químicos , Peso Molecular , Concentración Osmolar , Conformación Proteica , Solventes/química , Temperatura , Agua/química

18.

Statistical estimation of statistical mechanical models: helix-coil theory and peptide helicity prediction.

Schmidler, Scott C; Lucas, Joseph E; Oas, Terrence G.

J Comput Biol ; 14(10): 1287-310, 2007 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-18047425

RESUMEN

Analysis of biopolymer sequences and structures generally adopts one of two approaches: use of detailed biophysical theoretical models of the system with experimentally-determined parameters, or largely empirical statistical models obtained by extracting parameters from large datasets. In this work, we demonstrate a merger of these two approaches using Bayesian statistics. We adopt a common biophysical model for local protein folding and peptide configuration, the helix-coil model. The parameters of this model are estimated by statistical fitting to a large dataset, using prior distributions based on experimental data. L(1)-norm shrinkage priors are applied to induce sparsity among the estimated parameters, resulting in a significantly simplified model. Formal statistical procedures for evaluating support in the data for previously proposed model extensions are presented. We demonstrate the advantages of this approach including improved prediction accuracy and quantification of prediction uncertainty, and discuss opportunities for statistical design of experiments. Our approach yields a 39% improvement in mean-squared predictive error over the current best algorithm for this problem. In the process we also provide an efficient recursive algorithm for exact calculation of ensemble helicity including sidechain interactions, and derive an explicit relation between homo- and heteropolymer helix-coil theories and Markov chains and (non-standard) hidden Markov models respectively, which has not appeared in the literature previously.

Asunto(s)

Modelos Estadísticos , Péptidos/química , Estructura Secundaria de Proteína , Concentración de Iones de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Reproducibilidad de los Resultados , Análisis de Secuencia de Proteína , Temperatura

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA