Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
J Cell Biochem ; 2023 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-37992221

RESUMEN

This paper discusses the properties of proteins and their relations in the interactomes of the selected subsets of SARS-CoV-2 proteome-the membrane protein, nonstructural proteins, and, finally, full proteome. Protein disorder according to several measures, liquid-liquid phase separation probabilities, and protein node degrees in the interaction networks were singled out as the features of interest. Additionally, viral interactomes were combined with the interactome of human lung tissue so as to examine if the new connections in the resulting viral-host interactome are linked to protein disorder. Correlation analysis shows that there is no clear relationship between raw features of interest, whereas there is a positive correlation between the protein disorder and its neighborhood mean disorder. There are also indications that highly connected viral hubs tend to be on average more ordered than proteins with a small number of connections. This is in contrast to previous similar studies conducted on eukaryotic interactomes and possibly raises new questions in research on viral interactomes.

2.
Int J Biol Macromol ; 167: 446-456, 2021 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-33278435

RESUMEN

The correlation of molecular function and protein intrinsic disorder is an important aspect of understanding the relationship between function, sequence and structure. This research was inspired by statistical correlation evaluation method described by Xie et al. (J Proteome Res 6 (2007) 1882-1898, reference study), where the authors analyzed the relationship between structure and function of proteins from Swiss-Prot database and where these functions were described with Swiss-Prot function keywords. In this research, we investigated whether the conclusions from the reference study stand for another dataset with richer functional annotation. We used CAFA3 challenge training dataset where the function was described with terms from Gene Ontology (GO terms). In order to compare the results with the previous work, we associated the GO terms with the corresponding Swiss-Prot function keywords. The results were compared with the reference study by first repeating the analysis with Swiss-Prot function keywords and then by GO terms. We used PONDR VSL2b disorder predictor to label over 66,000 CAFA3 proteins as putatively disordered or ordered. Out of 186 Swiss-Prot keywords (belonging to molecular function type) with more than 20 annotated proteins, we found 47 to be highly order related and 44 highly disorder related. Using the same dataset and annotation constraints, out of 1781 GO term (belonging to molecular function type), we found 746 to be highly order related and 564 highly disorder related. GO term results are presented as interactive graphs displaying complex hierarchical structure of Gene Ontology. Comparison of two functional annotations, GO and Swiss-Prot keywords, showed consistent results in cases when it was possible to map a Swiss-Prot keyword to a corresponding GO term. Because of the small number of such cases, we propose a new method for deriving the missing mappings between Swiss-Prot keywords and GO terms with the highest likelihood by measuring similarity (Jaccard index) between sets of protein annotated by different functions. Comparison with results from the reference study revealed prevalence of binding related functions (disorder related) in the current dataset even though the same functions were not present in previous results.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Conformación Proteica , Desplegamiento Proteico , Análisis de Secuencia de Proteína
3.
BMC Bioinformatics ; 19(1): 158, 2018 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-29699482

RESUMEN

BACKGROUND: In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mechanisms: the entropic chain mechanism which is responsible for rapid fluctuations among many alternative conformations, and molecular recognition via short recognition elements that bind to other molecules. IDPs possess a high adaptive potential and there is special interest in investigating their involvement in organism evolution. RESULTS: We analyzed 2554 Bacterial and 139 Archaeal proteomes, with a total of 8,455,194 proteins for disorder content and its implications for adaptation of organisms, using three disorder predictors and three measures. Along with other findings, we revealed that for all three predictors and all three measures (1) Bacteria exhibit significantly more disorder than Archaea; (2) plasmid-encoded proteins contain considerably more IDRs than proteins encoded on chromosomes (or whole genomes) in both prokaryote superkingdoms; (3) plasmid proteins are significantly more disordered than chromosomal proteins only in the group of proteins with no COG category assigned; (4) antitoxin proteins in comparison to other proteins, are the most disordered (almost double) in both Bacterial and Archaeal proteomes; (5) plasmidal proteins are more disordered than chromosomal proteins in Bacterial antitoxins and toxin-unclassified proteins, but have almost the same disorder content in toxin proteins. CONCLUSION: Our results suggest that while disorder content depends on genome and proteome characteristics, it is more influenced by functional engagements than by gene location (on chromosome or plasmid).


Asunto(s)
Archaea/genética , Proteínas Arqueales/química , Bacterias/genética , Proteínas Bacterianas/química , Proteínas Intrínsecamente Desordenadas/química , Plásmidos/metabolismo , Cromosomas de Archaea/metabolismo , Cromosomas Bacterianos/metabolismo , Proteoma/metabolismo , Toxinas Biológicas/química
4.
Int J Data Min Bioinform ; 7(2): 196-213, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23777176

RESUMEN

To associate phenotypic characteristics of an organism to molecules encoded by its genome, there is a need for well-structured genotype and phenotype data. We use a novel method for extracting data on phenotype and genotype characteristics of microorganisms from text. As a resource, we use an encyclopedia of microorganisms, which holds phenotypic and genotypic data and create a structured, flexible data resource, which can be exported to a range of database formats, containing genotype and phenotype data for 2412 species and 873 genera of microbes. This data source has great potential as a resource for future biological research on genotype-phenotype associations. In this paper, we focus on describing the structure and content of the resulting database and on evaluating the method used for extracting the data. We conclude that the resulting database can be used as a reliable complementary resource for research into genotype-phenotype association.


Asunto(s)
Minería de Datos/métodos , Bases de Datos Genéticas , Enciclopedias como Asunto , Estudios de Asociación Genética , Bacterias/genética
5.
BMC Bioinformatics ; 12: 66, 2011 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-21366926

RESUMEN

BACKGROUND: A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.e., superkingdoms Archaea and Bacteria) with respect to functional categories they belong to, i.e., Clusters of Orthologous Groups of proteins (COGs) and groups of COGs-Cellular processes (Cp), Information storage and processing (Isp), Metabolism (Me) and Poorly characterized (Pc). We also analyzed the disorder content of proteins with respect to various genomic, metabolic and ecological characteristics of the organism they belong to. We used correlations and association rule mining in order to identify the most confident associations between specific modalities of the characteristics considered and disorder content. RESULTS: Bacteria are shown to have a somewhat higher level of protein disorder than archaea, except for proteins in the Me functional group. It is demonstrated that the Isp and Cp functional groups in particular (L-repair function and N-cell motility and secretion COGs of proteins in specific) possess the highest disorder content, while Me proteins, in general, posses the lowest. Disorder fractions have been confirmed to have the lowest level for the so-called order-promoting amino acids and the highest level for the so-called disorder promoters. For each pair of organism characteristics, specific modalities are identified with the maximum disorder proteins in the corresponding organisms, e.g., high genome size-high GC content organisms, facultative anaerobic-low GC content organisms, aerobic-high genome size organisms, etc. Maximum disorder in archaea is observed for high GC content-low genome size organisms, high GC content-facultative anaerobic or aquatic or mesophilic organisms, etc. Maximum disorder in bacteria is observed for high GC content-high genome size organisms, high genome size-aerobic organisms, etc. Some of the most reliable association rules mined establish relationships between high GC content and high protein disorder, medium GC content and both medium and low protein disorder, anaerobic organisms and medium protein disorder, Gammaproteobacteria and low protein disorder, etc. A web site Prokaryote Disorder Database has been designed and implemented at the address http://bioinfo.matf.bg.ac.rs/disorder, which contains complete results of the analysis of protein disorder performed for 296 prokaryotic completely sequenced genomes. CONCLUSIONS: Exhaustive disorder analysis has been performed by functional classes of proteins, for a larger dataset of prokaryotic organisms than previously done. Results obtained are well correlated to those previously published, with some extension in the range of disorder level and clear distinction between functional classes of proteins. Wide correlation and association analysis between protein disorder and genomic and ecological characteristics has been performed for the first time. The results obtained give insight into multi-relationships among the characteristics and protein disorder. Such analysis provides for better understanding of the evolutionary process and may be useful for taxon determination. The main drawback of the approach is the fact that the disorder considered has been predicted and not experimentally established.


Asunto(s)
Proteínas Arqueales/análisis , Proteínas Bacterianas/análisis , Biología Computacional/métodos , Aminoácidos/análisis , Archaea/genética , Archaea/metabolismo , Proteínas Arqueales/química , Bacterias/genética , Bacterias/metabolismo , Proteínas Bacterianas/química , Composición de Base , Análisis por Conglomerados , Bases de Datos de Proteínas , Genómica/métodos , Internet , Conformación Proteica , Proteoma/análisis
6.
Comput Methods Programs Biomed ; 93(3): 241-56, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19101056

RESUMEN

The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the genome, binary classification of islands based on n-gram frequency distribution have been performed. It consists of testing the agreement of islands n-gram frequency distributions with the complete genome and backbone sequence. In addition, a statistic based on the maximal order Markov model is used to identify significantly overrepresented and underrepresented n-grams in islands. The results may be used as a basis for Zipf-like analysis suggesting that some of the n-grams are overrepresented in a subset of islands and underrepresented in the backbone, or vice versa, thus complementing the binary classification. The method is applied to strain-specific regions in the Escherichia coli O157:H7 EDL933 genome (O-islands), resulting in two groups of O-islands with different n-gram characteristics. It refines a characterization based on other compositional features such as G+C content and codon usage, and may help in identification of GIs, and also in research and development of adequate drugs targeting virulence genes in them.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano , Islas Genómicas , Modelos Estadísticos , Composición de Base/genética , Secuencia de Bases/genética , Codón/análisis , Escherichia coli O157/genética , Transferencia de Gen Horizontal , Genoma Bacteriano/genética , Genómica/métodos , Cadenas de Markov , Datos de Secuencia Molecular
7.
J Biomed Inform ; 41(6): 936-43, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18448392

RESUMEN

There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results-recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes. These predictions were compared with results obtained from HGT DB (based on the compositional approach) and PAI DB (based on the combined approach). The results obtained show that intersecting n-grams with other compositional features improves relative precision by up to 10% in case of HGT DB and up to 60% in case of PAI DB. In addition, it was demonstrated that the union of all compositional features results in maximum recall (up to 37%). Thus, the application of n-gram analysis alongside existing or newly developed methods may improve the prediction of GI/PAIs.


Asunto(s)
Genoma Bacteriano , Escherichia coli O157/genética
8.
Genomics Proteomics Bioinformatics ; 3(1): 18-35, 2005 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16144519

RESUMEN

A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a "profile", were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.a.) property changes. Similar analysis was performed for the spike (S) protein in all the isolates (55 of them being predicted for the first time). The ratio Ka/Ks confirmed that the S gene was subjected to the Darwinian selection during virus transmission from animals to humans. Isolates from the dataset were classified according to genome polymorphism and genotypes. Genome polymorphism yields to two groups, one with a small number of SNVs and another with a large number of SNVs, with up to four subgroups with respect to insertions and deletions. We identified three basic nine-locus genotypes: TTTT/TTCGG, CGCC/TTCAT, and TGCC/TTCGT, with four subgenotypes. Both classifications proposed are in accordance with the new insights into possible epidemiological spread, both in space and time.


Asunto(s)
Biología Computacional , Variación Genética , Genoma , Polimorfismo Genético/genética , Síndrome Respiratorio Agudo Grave/genética , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Viverridae/genética , Secuencia de Aminoácidos , Animales , Humanos , Datos de Secuencia Molecular , Mutación , Filogenia , Eliminación de Secuencia , Homología de Secuencia de Aminoácido , Taiwán
9.
BMC Bioinformatics ; 5: 65, 2004 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-15161495

RESUMEN

BACKGROUND: We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs. RESULTS: The nucleotide structure of all 38 isolates is presented. Based on insertions and deletions and dissimilarity due to SNPs, the dataset of all the isolates has been qualitatively classified into three groups each having their own subgroups. These are the A-group with "regular" isolates (no insertions / deletions except for 5' and 3' ends), the B-group of isolates with "long insertions", and the C-group of isolates with "many individual" insertions and deletions. The isolate with the smallest average number of SNPs, compared to other isolates, has been identified (TWH). The density distribution of SNPs, insertions and deletions for each group or subgroup, as well as cumulatively for all the isolates is also presented, along with the gene map for TWH. Since individual SNPs may have occurred at random, positions corresponding to multiple SNPs (occurring in two or more isolates) are identified and presented. This result revises some previous results of a similar type. Amino acid changes caused by multiple SNPs are also identified (for the annotated sequences, as well as presupposed amino acid changes for non-annotated ones). Exact SNP positions for the isolates in each group or subgroup are presented. Finally, a phylogenetic tree for the SARS-CoV isolates has been produced using the CLUSTALW program, showing high compatibility with former qualitative classification. CONCLUSIONS: The comparative study of SARS-CoV isolates provides essential information for genome polymorphism, indication of strain differences and variants evolution. It may help with the development of effective treatment.


Asunto(s)
Biología Computacional/métodos , Genoma Viral , Polimorfismo Genético/genética , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Secuencia de Aminoácidos/genética , ADN Viral/genética , Mutagénesis Insercional/genética , Filogenia , Polimorfismo de Nucleótido Simple/genética , Eliminación de Secuencia/genética , Proteínas Virales/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA