Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Biochim Biophys Acta Proteins Proteom ; 1865(1): 43-54, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27718363

RESUMEN

Therapeutic protein kinase inhibitors are designed on the basis of kinase structures. Here, we define intrinsically disordered regions (IDRs) in structurally hybrid kinases. We reveal that 65% of kinases have an IDR adjacent to their kinase domain (KD). These IDRs are evolutionarily more conserved than IDRs distant to KDs. Strikingly, 36 kinases have adjacent IDRs extending into their KDs, defining a unique structural and functional subset of the kinome. Functional network analysis of this subset of the kinome uncovered FAK1 as topologically the most connected hub kinase. We identify that KD-flanking IDR of FAK1 is more conserved and undergoes more post-translational modifications than other IDRs. It preferentially interacts with proteins regulating scaffolding and kinase activity, which contribute to cytoskeletal remodeling. In summary, spatially and evolutionarily conserved IDRs in kinases may influence their functions, which can be exploited for targeted therapies in diseases including those that involve aberrant cytoskeletal remodeling.


Asunto(s)
Citoesqueleto/metabolismo , Quinasa 1 de Adhesión Focal/química , Citoesqueleto/enzimología , Quinasa 1 de Adhesión Focal/metabolismo , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/metabolismo , Conformación Proteica , Procesamiento Proteico-Postraduccional
2.
Data Brief ; 10: 315-324, 2017 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-28004021

RESUMEN

We present data on the evolution of intrinsically disordered regions (IDRs) taking into account the entire human protein kinome. The evolutionary data of the IDRs with respect to the kinase domains (KDs) and kinases as a whole protein (WP) are reported. Further, we have reported its post translational modifications of FAK1 IDRs and their contribution to the cytoskeletal remodeling. We also report the data to build a protein-protein interaction (PPI) network of primary and secondary FAK1-interacting hybrid proteins. Detailed analysis of the data and its effect on FAK1-related functions have been described in "Structural pliability adjacent to the kinase domain highlights contribution of FAK1 IDRs to cytoskeletal remodeling" (Kathiriya et. al., 2016) [1].

3.
PLoS Genet ; 9(2): e1003280, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23468640

RESUMEN

Expansions of trinucleotide CAG/CTG repeats in somatic tissues are thought to contribute to ongoing disease progression through an affected individual's life with Huntington's disease or myotonic dystrophy. Broad ranges of repeat instability arise between individuals with expanded repeats, suggesting the existence of modifiers of repeat instability. Mice with expanded CAG/CTG repeats show variable levels of instability depending upon mouse strain. However, to date the genetic modifiers underlying these differences have not been identified. We show that in liver and striatum the R6/1 Huntington's disease (HD) (CAG)∼100 transgene, when present in a congenic C57BL/6J (B6) background, incurred expansion-biased repeat mutations, whereas the repeat was stable in a congenic BALB/cByJ (CBy) background. Reciprocal congenic mice revealed the Msh3 gene as the determinant for the differences in repeat instability. Expansion bias was observed in congenic mice homozygous for the B6 Msh3 gene on a CBy background, while the CAG tract was stabilized in congenics homozygous for the CBy Msh3 gene on a B6 background. The CAG stabilization was as dramatic as genetic deficiency of Msh2. The B6 and CBy Msh3 genes had identical promoters but differed in coding regions and showed strikingly different protein levels. B6 MSH3 variant protein is highly expressed and associated with CAG expansions, while the CBy MSH3 variant protein is expressed at barely detectable levels, associating with CAG stability. The DHFR protein, which is divergently transcribed from a promoter shared by the Msh3 gene, did not show varied levels between mouse strains. Thus, naturally occurring MSH3 protein polymorphisms are modifiers of CAG repeat instability, likely through variable MSH3 protein stability. Since evidence supports that somatic CAG instability is a modifier and predictor of disease, our data are consistent with the hypothesis that variable levels of CAG instability associated with polymorphisms of DNA repair genes may have prognostic implications for various repeat-associated diseases.


Asunto(s)
Enfermedad de Huntington/genética , Proteínas/genética , Expansión de Repetición de Trinucleótido/genética , Repeticiones de Trinucleótidos/genética , Animales , Cuerpo Estriado/metabolismo , Modelos Animales de Enfermedad , Inestabilidad Genómica , Humanos , Ratones , Proteína 3 Homóloga de MutS , Distrofia Miotónica/genética , Distrofia Miotónica/metabolismo , Neostriado/metabolismo , Proteínas del Tejido Nervioso/genética , Proteínas del Tejido Nervioso/metabolismo , Polimorfismo Genético , Estabilidad Proteica
4.
Mol Biol Evol ; 30(2): 332-46, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22977115

RESUMEN

Protein interaction networks play central roles in biological systems, from simple metabolic pathways through complex programs permitting the development of organisms. Multicellularity could only have arisen from a careful orchestration of cellular and molecular roles and responsibilities, all properly controlled and regulated. Disease reflects a breakdown of this organismal homeostasis. To better understand the evolution of interactions whose dysfunction may be contributing factors to disease, we derived the human protein coevolution network using our MatrixMatchMaker algorithm and using the Orthologous MAtrix project (OMA) database as a source for protein orthologs from 103 eukaryotic genomes. We annotated the coevolution network using protein-protein interaction data, many functional data sources, and we explored the evolutionary rates and dates of emergence of the proteins in our data set. Strikingly, clustering based only on the topology of the coevolution network partitions it into two subnetworks, one generally representing ancient eukaryotic functions and the other functions more recently acquired during animal evolution. That latter subnetwork is enriched for proteins with roles in cell-cell communication, the control of cell division, and related multicellular functions. Further annotation using data from genetic disease databases and cancer genome sequences strongly implicates these proteins in both ciliopathies and cancer. The enrichment for such disease markers in the animal network suggests a functional link between these coevolving proteins. Genetic validation corroborates the recruitment of ancient cilia in the evolution of multicellularity.


Asunto(s)
Evolución Biológica , Comunicación Celular/fisiología , Proteínas/genética , Proteínas/metabolismo , Animales , Trastornos de la Motilidad Ciliar/genética , Trastornos de la Motilidad Ciliar/metabolismo , Análisis por Conglomerados , Bases de Datos de Proteínas , Femenino , Expresión Génica , Humanos , Masculino , Mutación , Neoplasias/genética , Neoplasias/metabolismo , Unión Proteica , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas
5.
Cell ; 150(5): 1068-81, 2012 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-22939629

RESUMEN

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.


Asunto(s)
Complejos Multiproteicos/análisis , Mapas de Interacción de Proteínas , Proteínas/química , Proteómica/métodos , Humanos , Espectrometría de Masas en Tándem
6.
Methods Mol Biol ; 781: 237-56, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21877284

RESUMEN

Bioinformatic methods to predict protein-protein interactions (PPI) via coevolutionary analysis have -positioned themselves to compete alongside established in vitro methods, despite a lack of understanding for the underlying molecular mechanisms of the coevolutionary process. Investigating the alignment of coevolutionary predictions of PPI with experimental data can focus the effective scope of prediction and lead to better accuracies. A new rate-based coevolutionary method, MMM, preferentially finds obligate interacting proteins that form complexes, conforming to results from studies based on coimmunoprecipitation coupled with mass spectrometry. Using gold-standard databases as a benchmark for accuracy, MMM surpasses methods based on abundance ratios, suggesting that correlated evolutionary rates may yet be better than coexpression at predicting interacting proteins. At the level of protein domains, -coevolution is difficult to detect, even with MMM, except when considering small-scale experimental data involving proteins with multiple domains. Overall, these findings confirm that coevolutionary -methods can be confidently used in predicting PPI, either independently or as drivers of coimmunoprecipitation experiments.


Asunto(s)
Evolución Biológica , Biología Computacional , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Algoritmos , Inmunoprecipitación , Filogenia , Unión Proteica
7.
Biochem Cell Biol ; 88(2): 185-94, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-20453921

RESUMEN

GroEL is a chaperone thought of as essential for bacterial life. However, some species of Mollicutes are missing GroEL. We use phylogenetic analysis to show that the presence of GroEL is polyphyletic among the Mollicutes, and that there is evidence for lateral gene transfer of GroEL to Mycoplasma penetrans from the Proteobacteria. Furthermore, we propose that the presence of GroEL in Mycoplasma may be required for invasion of host tissue, suggesting that GroEL may act as an adhesin-invasin.


Asunto(s)
Chaperonina 60/genética , Chaperonina 60/metabolismo , Tenericutes/genética , Tenericutes/metabolismo , Chaperonina 60/química , Filogenia , Tenericutes/química
8.
Proteins ; 78(3): 548-58, 2010 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-19768681

RESUMEN

Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.


Asunto(s)
Análisis Mutacional de ADN/métodos , Modelos Genéticos , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Sitios de Unión , Análisis por Conglomerados , Secuencia Conservada , Variación Genética , Modelos Moleculares , Filogenia , Proteínas/metabolismo , Alineación de Secuencia
9.
Microb Biotechnol ; 3(6): 677-90, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21255363

RESUMEN

One hundred and seventy-one genes encoding potential esterases from 11 bacterial genomes were cloned and overexpressed in Escherichia coli; 74 of the clones produced soluble proteins. All 74 soluble proteins were purified and screened for esterase activity; 36 proteins showed carboxyl esterase activity on short-chain esters, 17 demonstrated arylesterase activity, while 38 proteins did not exhibit any activity towards the test substrates. Esterases from Rhodopseudomonas palustris (RpEST-1, RpEST-2 and RpEST-3), Pseudomonas putida (PpEST-1, PpEST-2 and PpEST-3), Pseudomonas aeruginosa (PaEST-1) and Streptomyces avermitilis (SavEST-1) were selected for detailed biochemical characterization. All of the enzymes showed optimal activity at neutral or alkaline pH, and the half-life of each enzyme at 50°C ranged from < 5 min to over 5 h. PpEST-3, RpEST-1 and RpEST-2 demonstrated the highest specific activity with pNP-esters; these enzymes were also among the most stable at 50°C and in the presence of detergents, polar and non-polar organic solvents, and imidazolium ionic liquids. Accordingly, these enzymes are particularly interesting targets for subsequent application trials. Finally, biochemical and bioinformatic analyses were compared to reveal sequence features that could be correlated to enzymes with arylesterase activity, facilitating subsequent searches for new esterases in microbial genome sequences.


Asunto(s)
Bacterias/enzimología , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Hidrolasas de Éster Carboxílico/genética , Hidrolasas de Éster Carboxílico/metabolismo , Genoma Bacteriano , Proteínas Bacterianas/química , Proteínas Bacterianas/aislamiento & purificación , Hidrolasas de Éster Carboxílico/química , Hidrolasas de Éster Carboxílico/aislamiento & purificación , Biología Computacional , Estabilidad de Enzimas , Concentración de Iones de Hidrógeno , Especificidad por Sustrato , Temperatura
10.
Genome Res ; 19(10): 1861-71, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19696150

RESUMEN

Coevolution maintains interactions between phenotypic traits through the process of reciprocal natural selection. Detecting molecular coevolution can expose functional interactions between molecules in the cell, generating insights into biological processes, pathways, and the networks of interactions important for cellular function. Prediction of interaction partners from different protein families exploits the property that interacting proteins can follow similar patterns and relative rates of evolution. Current methods for detecting coevolution based on the similarity of phylogenetic trees or evolutionary distance matrices have, however, been limited by requiring coevolution over the entire evolutionary history considered and are inaccurate in the presence of paralogous copies. We present a novel method for determining coevolving protein partners by finding the largest common submatrix in a given pair of distance matrices, with the size of the largest common submatrix measuring the strength of coevolution. This approach permits us to consider matrices of different size and scale, to find lineage-specific coevolution, and to predict multiple interaction partners. We used MatrixMatchMaker to predict protein-protein interactions in the human genome. We show that proteins that are known to interact physically are more strongly coevolving than proteins that simply belong to the same biochemical pathway. The human coevolution network is highly connected, suggesting many more protein-protein interactions than are currently known from high-throughput and other experimental evidence. These most strongly coevolving proteins suggest interactions that have been maintained over long periods of evolutionary time, and that are thus likely to be of fundamental importance to cellular function.


Asunto(s)
Evolución Molecular , Redes Reguladoras de Genes/genética , Proteínas/genética , Calibración , Biología Computacional/métodos , Bases de Datos de Proteínas , Predicción , Variación Genética , Humanos , Redes y Vías Metabólicas/genética , Filogenia , Unión Proteica/genética , Dominios y Motivos de Interacción de Proteínas/genética , Proteínas/metabolismo , Sensibilidad y Especificidad , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de Proteína/normas , Programas Informáticos/normas
11.
Biomol Eng ; 24(3): 321-6, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17502167

RESUMEN

RNA sequences can form structures which are conserved throughout evolution and the question of aligning two RNA secondary structures has been extensively studied. Most of the previous alignment algorithms require the input of gap opening and gap extension penalty parameters. The choice of appropriate parameter values is controversial as there is little biological information to guide their assignment. In this paper, we present an algorithm which circumvents this problem. Instead of finding an optimal alignment with predefined gap opening penalty, the algorithm finds the optimal alignment with exact number of aligned blocks.


Asunto(s)
Algoritmos , ARN/química , ARN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Homología de Secuencia de Ácido Nucleico
12.
Bioinformatics ; 23(10): 1195-202, 2007 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-17392329

RESUMEN

MOTIVATION: With hundreds of completely sequenced microbial genomes available, and advancements in DNA microarray technology, the detection of genes in microbial communities consisting of hundreds of thousands of sequences may be possible. The existing strategies developed for DNA probe design, geared toward identifying specific sequences, are not suitable due to the lack of coverage, flexibility and efficiency necessary for applications in metagenomics. METHODS: ProDesign is a tool developed for the selection of oligonucleotide probes to detect members of gene families present in environmental samples. Gene family-specific probe sequences are generated based on specific and shared words, which are found with the spaced seed hashing algorithm. To detect more sequences, those sharing some common words are re-clustered into new families, then probes specific for the new families are generated. RESULTS: The program is very flexible in that it can be used for designing probes for detecting many genes families simultaneously and specifically in one or more genomes. Neither the length nor the melting temperature of the probes needs to be predefined. We have found that ProDesign provides more flexibility, coverage and speed than other software programs used in the selection of probes for genomic and gene family arrays. AVAILABILITY: ProDesign is licensed free of charge to academic users. ProDesign and Supplementary Material can be obtained by contacting the authors. A web server for ProDesign is available at http://www.uhnresearch.ca/labs/tillier/ProDesign/ProDesign.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Familia de Multigenes , Sondas de Oligonucleótidos/genética , Bacterias/genética , Genoma Bacteriano , Análisis por Micromatrices , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos
13.
Evol Bioinform Online ; 2: 77-90, 2007 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-19455203

RESUMEN

In comparative genomic studies, syntenic groups of homologous sequence in the same order have been used as supplementary information that can be used in helping to determine the orthology of the compared sequences. The assumption is that orthologous gene copies are more likely to share the same genome positions and share the same gene neighbors. In this study we have defined positional homologs as those that also have homologous neighboring genes and we investigated the usefulness of this distinction for bacterial comparative genomics. We considered the identification of positionaly homologous gene pairs in bacterial genomes using protein and DNA sequence level alignments and found that the positional homologs had on average relatively lower rates of substitution at the DNA level (synonymous substitutions) than duplicate homologs in different genomic locations, regardless of the level of protein sequence divergence (measured with non-synonymous substitution rate). Since gene order conservation can indicate accuracy of orthology assignments, we also considered the effect of imposing certain alignment quality requirements on the sensitivity and specificity of identification of protein pairs by BLAST and FASTA when neighboring information is not available and in comparisons where gene order is not conserved. We found that the addition of a stringency filter based on the second best hits was an efficient way to remove dubious ortholog identifications in BLAST and FASTA analyses. Gene order conservation and DNA sequence homology are useful to consider in comparative genomic studies as they may indicate different orthology assignments than protein sequence homology alone.

14.
BMC Bioinformatics ; 7: 471, 2006 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-17062146

RESUMEN

BACKGROUND: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. RESULTS: We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30,000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. CONCLUSION: Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.


Asunto(s)
Secuencia de Aminoácidos , Proteínas/química , Alineación de Secuencia/métodos , Programas Informáticos , Biología Computacional , Simulación por Computador , Bases de Datos de Proteínas , Eliminación de Gen , Mutación , Conformación Proteica , Proteínas/genética , Alineación de Secuencia/normas
15.
Proteins ; 63(4): 822-31, 2006 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-16634043

RESUMEN

Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.


Asunto(s)
Evolución Molecular , Proteínas/genética , Proteínas/metabolismo , Simulación por Computador , Filogenia , Unión Proteica , Proteínas/química , Programas Informáticos
16.
BMC Bioinformatics ; 6: 236, 2005 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-16188037

RESUMEN

BACKGROUND: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. RESULTS: We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. CONCLUSION: Simprot will be useful for testing methods of analysis of protein sequence families particularly alignment methods, phylogenetic tree building, detection of recombination and horizontal gene transfer, and homology detection, where knowing the true course of sequence evolution is essential.


Asunto(s)
Simulación por Computador , Evolución Molecular , Modelos Genéticos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Sustitución de Aminoácidos , Modelos Estadísticos , Método de Montecarlo , Filogenia , Selección Genética , Diseño de Software
17.
Mol Biol Evol ; 21(3): 419-27, 2004 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-14660689

RESUMEN

Empirical models of substitution are often used in protein sequence analysis because the large alphabet of amino acids requires that many parameters be estimated in all but the simplest parametric models. When information about structure is used in the analysis of substitutions in structured RNA, a similar situation occurs. The number of parameters necessary to adequately describe the substitution process increases in order to model the substitution of paired bases. We have developed a method to obtain substitution rate matrices empirically from RNA alignments that include structural information in the form of base pairs. Our data consisted of alignments from the European Ribosomal RNA Database of Bacterial and Eukaryotic Small Subunit and Large Subunit Ribosomal RNA ( Wuyts et al. 2001. Nucleic Acids Res. 29:175-177; Wuyts et al. 2002. Nucleic Acids Res. 30:183-185). Using secondary structural information, we converted each sequence in the alignments into a sequence over a 20-symbol code: one symbol for each of the four individual bases, and one symbol for each of the 16 ordered pairs. Substitutions in the coded sequences are defined in the natural way, as observed changes between two sequences at any particular site. For given ranges (windows) of sequence divergence, we obtained substitution frequency matrices for the coded sequences. Using a technique originally developed for modeling amino acid substitutions ( Veerassamy, Smith, and Tillier. 2003. J. Comput. Biol. 10:997-1010), we were able to estimate the actual evolutionary distance for each window. The actual evolutionary distances were used to derive instantaneous rate matrices, and from these we selected a universal rate matrix. The universal rate matrices were incorporated into the Phylip Software package ( Felsenstein 2002. http://evolution.genetics.washington.edu/phylip.html), and we analyzed the ribosomal RNA alignments using both distance and maximum likelihood methods. The empirical substitution models performed well on simulated data, and produced reasonable evolutionary trees for 16S ribosomal RNA sequences from sequenced Bacterial genomes. Empirical models have the advantage of being easily implemented, and the fact that the code consists of 20 symbols makes the models easily incorporated into existing programs for protein sequence analysis. In addition, the models are useful for simulating the evolution of RNA sequence and structure simultaneously.


Asunto(s)
Sustitución de Aminoácidos , Modelos Genéticos , ARN Ribosómico/genética , Alineación de Secuencia/métodos , Animales , Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Evolución Molecular , Funciones de Verosimilitud , Filogenia
18.
Bioinformatics ; 19(6): 750-5, 2003 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-12691987

RESUMEN

MOTIVATION: Multiple sequence alignments of homologous proteins are useful for inferring their phylogenetic history and to reveal functionally important regions in the proteins. Functional constraints may lead to co-variation of two or more amino acids in the sequence, such that a substitution at one site is accompanied by compensatory substitutions at another site. It is not sufficient to find the statistical correlations between sites in the alignment because these may be the result of several undetermined causes. In particular, phylogenetic clustering will lead to many strong correlations. RESULTS: A procedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence. Our method relies upon the accuracy of the alignment but it does not require any assumptions about the phylogeny or the substitution process. The effectiveness of the method was verified using computer simulations and then applied to predict functional interactions between amino acids in the Pfam database of alignments.


Asunto(s)
Algoritmos , Modelos Moleculares , Filogenia , Proteínas/química , Proteínas/clasificación , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Modelos Estadísticos , Datos de Secuencia Molecular , Conformación Proteica , Estructura Secundaria de Proteína , Control de Calidad , Homología de Secuencia de Aminoácido
19.
J Comput Biol ; 10(6): 997-1010, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-14980022

RESUMEN

Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for protein sequence alignments, they do not describe an evolutionary model. BLOSUM matrices do not permit the estimation of the actual number of amino acid substitutions between sequences by correcting for multiple hits. The method presented here uses the Blocks database of protein alignments, along with the additivity of evolutionary distances, to approximate the amino acid substitution probabilities as a function of actual evolutionary distance. The PMB (Probability Matrix from Blocks) defines a new evolutionary model for protein evolution that can be used for evolutionary analyses of protein sequences. Our model is directly derived from, and thus compatible with, the BLOSUM matrices. The model has the additional advantage of being easily implemented.


Asunto(s)
Sustitución de Aminoácidos , Evolución Molecular , Modelos Genéticos , Probabilidad , Proteínas/química , Biología Computacional , Bases de Datos de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA