Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
PLoS One ; 15(11): e0237205, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33156862

RESUMEN

Determination of metagenome composition is still one of the most interesting problems of bioinformatics. It involves a wide range of mathematical methods, from probabilistic models of combinatorics to cluster analysis and pattern recognition techniques. The successful advance of rapid sequencing methods and fast and precise metagenome analysis will increase the diagnostic value of healthy or pathological human metagenomes. The article presents the theoretical foundations of the algorithm for calculating the number of different genomes in the medium under study. The approach is based on analysis of the compositional spectra of subsequently sequenced samples of the medium. Its essential feature is using random fluctuations in the bacteria number in different samples of the same metagenome. The possibility of effective implementation of the algorithm in the presence of data errors is also discussed. In the work, the algorithm of a metagenome evaluation is described, including the estimation of the genome number and the identification of the genomes with known compositional spectra. It should be emphasized that evaluating the genome number in a metagenome can be always helpful, regardless of the metagenome separation techniques, such as clustering the sequencing results or marker analysis.


Asunto(s)
Algoritmos , Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Metagenoma , Análisis de Secuencia de ADN/métodos , Humanos , Filogenia
2.
BMC Genomics ; 15: 252, 2014 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-24684786

RESUMEN

BACKGROUND: In an earlier study, we hypothesized that genomic segments with different sequence organization patterns (OPs) might display functional specificity despite their similar GC content. Here we tested this hypothesis by dividing the human genome into 100 kb segments, classifying these segments into five compositional groups according to GC content, and then characterizing each segment within the five groups by oligonucleotide counting (k-mer analysis; also referred to as compositional spectrum analysis, or CSA), to examine the distribution of sequence OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and non-coding parts the latter being much more abundant in the genome than the former. RESULTS: We identified 38 OP-type clusters of segments that differ in their compositional spectrum (CS) organization. Many of the segments that shared the same OP type were enriched with genes related to the same biological processes (developmental, signaling, etc.), components of biochemical complexes, or organelles. Thirteen OP-type clusters showed significant enrichment in genes connected to specific gene-ontology terms. Some of these clusters seemed to reflect certain events during periods of horizontal gene transfer and genome expansion, and subsequent evolution of genomic regions requiring coordinated regulation. CONCLUSIONS: There may be a tendency for genes that are involved in the same biological process, complex or organelle to use the same OP, even at a distance of ~ 100 kb from the genes. Although the intergenic DNA is non-coding, the general pattern of sequence organization (e.g., reflected in over-represented oligonucleotide "words") may be important and were protected, to some extent, in the course of evolution.


Asunto(s)
Heterogeneidad Genética , Genoma Humano , Genómica , Animales , Composición de Base , Evolución Molecular , Genes , Variación Genética , Genoma Mitocondrial , Humanos , Familia de Multigenes , Duplicaciones Segmentarias en el Genoma
3.
Biomed Res Int ; 2013: 472163, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24371824

RESUMEN

Ancestral sequence reconstruction is a well-known problem in molecular evolution. The problem presented in this study is inspired by sequence reconstruction, but instead of leaf-associated sequences we consider only their lengths. We call this problem ancestral gene length reconstruction. It is a problem of finding an optimal labeling which minimizes the total length's sum of the edges, where both a tree and nonnegative integers associated with corresponding leaves of the tree are the input. In this paper we give a linear algorithm to solve the problem on binary trees for the Manhattan cost function s(v, w) = |π(v) - π(w)|.


Asunto(s)
Secuencia Conservada/genética , Evolución Molecular , Modelos Teóricos , Algoritmos , Análisis de Secuencia de ADN
4.
PLoS One ; 7(2): e32076, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22384143

RESUMEN

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.


Asunto(s)
Genoma , Algoritmos , Animales , Secuencia de Bases , Biodiversidad , Cromosomas/ultraestructura , Islas de CpG , Peces , Variación Genética , Humanos , Mamíferos/metabolismo , Modelos Genéticos , Modelos Estadísticos , Datos de Secuencia Molecular , Oligonucleótidos/genética , Análisis de Secuencia de ADN , Programas Informáticos , Vertebrados
5.
Chromosome Res ; 14(3): 307-17, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16628501

RESUMEN

Data are presented on the intra- and interspecific differences/similarities in chromosomal patterns of Ac-like elements (hAT family) in ecologically contrasted populations of three Triticeae species - Aegilops speltoides, Triticum urartu, and Hordeum spontaneum. Application of original computer software made it possible to precisely map transposon clusters and to link them to known chromosomal markers (rDNA sites, centromeres, and heterochromatin regions). From our data we can specify the most visible features of Ac-like elements chromosomal distribution: preferential concentration in chromosomal proximal regions; high percentage of clusters on the border between euchromatin and heterochromatin; complementary chromosomal arrangement towards En/Spm transposons (CACTA); population-specific insertions into centromeres; more differences in total cluster numbers between populations of self-pollinated species than between populations of cross-pollinated species. The application of statistical simulation (Resampling) method to analysis of data indicates that ecology may play a certain role in dynamics of Ac-like elements. Comparison of real Ayala distances, as well as real chromosomal distribution of Ac-like elements in populations of two species with different mating systems with the same but randomly simulated parameters, revealed that non-random population structure in the Mediterranean floral zone suffers and becomes chaotic in the Irano-Turanian zone.


Asunto(s)
Cromosomas de las Plantas , Diploidia , Grano Comestible/genética , Genética de Población , Retroelementos , Secuencia de Bases , Mapeo Cromosómico , ADN de Plantas , Grano Comestible/clasificación , Genoma de Planta , Hibridación in Situ , ARN de Planta/genética , ARN Ribosómico/genética , ARN Ribosómico 5S/genética , Especificidad de la Especie
6.
Biosystems ; 81(3): 208-22, 2005 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-15936870

RESUMEN

With the availability of genome sequences, the possibility of new phylogenetic reconstructions arises in order to reveal genomic relationships among organisms. According to the compositional-spectra (CS) approach proposed in our previous studies, any genomic sequence can be characterized by a distribution of frequencies of imperfect matching of words (oligonucleotides). In the current application of CS-analysis, we attempted to analyze the cluster structure of genomes across life. It appeared that compositional spectra show a clear three-group clustering of the compared prokaryotic and eukaryotic genomes. Unexpectedly, this grouping seriously differs from the classical Universal Tree of Life structure represented by common kingdoms known as Eubacteria, Archaebacteria, and Eukarya. The revealed CS-clustering displays high stability, putatively reflecting its objective nature, and still enigmatic biological significance that may result from convergent evolution driven by ecological selection. We believe that our approach provides a new and wider (compared to traditional methods) perspective of extracting genomic information of high evolutionary relevance.


Asunto(s)
Clasificación/métodos , Genoma/genética , Genómica/métodos , Oligonucleótidos/genética , Filogenia , Composición de Base , Secuencia de Bases/genética , Análisis por Conglomerados , Biología Computacional/métodos , Especificidad de la Especie
7.
J Mol Evol ; 59(4): 520-7, 2004 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-15638463

RESUMEN

The phenomenon of overlapping of various sequence messages in genomes is a puzzle for evolutionary theoreticians, geneticists, and sequence researchers. The overlapping is possible due to degeneracy of the messages, in particular, degeneracy of codons. It is often observed in organisms with a limited size of genome, possessing polymerases of low fidelity. The most accepted view considers the overlapping as a mechanism to increase the amount of information per unit length. Here we present a model that suggests direct evolutionary advantage of the message overlapping. Two opposing drives are considered: (a) reduction in the amount of vulnerable points when the overlapping of two messages involves common critical points and (b) cumulative compromising cost of coexistence of messages at the same site. Over a broad range of conditions the reduction of the target size prevails, thus making the overlapping of messages advantageous.


Asunto(s)
Codón/genética , Evolución Molecular , Modelos Genéticos , Animales , Secuencia de Bases , Genoma Bacteriano , Genoma Viral , Humanos , Funciones de Verosimilitud , Datos de Secuencia Molecular , Mutación , Alineación de Secuencia
8.
Proc Natl Acad Sci U S A ; 100(25): 14970-5, 2003 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-14645702

RESUMEN

We have found that genomic diversity is generally positively correlated with abiotic and biotic stress levels (1-3). However, beyond a high-threshold level of stress, the diversity declines to a few adapted genotypes. The Dead Sea is the harshest planetary hypersaline environment (340 g.liter-1 total dissolved salts, approximately 10 times sea water). Hence, the Dead Sea is an excellent natural laboratory for testing the "rise and fall" pattern of genetic diversity with stress proposed in this article. Here, we examined genomic diversity of the ascomycete fungus Aspergillus versicolor from saline, nonsaline, and hypersaline Dead Sea environments. We screened the coding and noncoding genomes of A. versicolor isolates by using >600 AFLP (amplified fragment length polymorphism) markers (equal to loci). Genomic diversity was positively correlated with stress, culminating in the Dead Sea surface but dropped drastically in 50- to 280-m-deep seawater. The genomic diversity pattern paralleled the pattern of sexual reproduction of fungal species across the same southward gradient of increasing stress in Israel. This parallel may suggest that diversity and sex are intertwined intimately according to the rise and fall pattern and adaptively selected by natural selection in fungal genome evolution. Future large-scale verification in micromycetes will define further the trajectories of diversity and sex in the rise and fall pattern.


Asunto(s)
Evolución Biológica , Hongos/fisiología , Clima , ADN/química , ADN/metabolismo , Ambiente , Variación Genética , Genotipo , Israel , Fenotipo , Filogenia , Polimorfismo Genético , Sales (Química)/química , Temperatura , Factores de Tiempo
9.
J Biomol Struct Dyn ; 21(3): 317-25, 2003 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-14616028

RESUMEN

Analysis of crystallized protein structures suggests that globular proteins are organized as consecutively connected units of 25-35 residues. These units are closed loops, that is returns of the polypeptide chain trajectory to a close contact with itself. This universal feature of apparently polymer-statistical nature is a basis for a principally novel view on the globular proteins as loop fold structures. The same unit size has been detected in protein sequences translated from complete prokaryotic genomes by positional autocorrelation analysis, which strongly indicates the evolutionary connection of the units. The units are further characterized by prototype sequences matching to their numerous derivatives in the translated genomes. The matches to five strongest prokaryotic prototypes and three prototypes of C. elegans are identified in the sequences of crystallized proteins, and their structures analyzed. Corresponding segments of the polypeptide chains in majority of cases form closed loops, though evolutionary fate of every prototype element is shown to be rather diverse. Then loop ends can be separated by a sequence-wise distant segments and stabilized by the spatial interactions in the context of the overall globular structure. The units belong to a presumably limited spectrum of the sequence prototypes, full repertoire of which would constitute a proteomic code.


Asunto(s)
Proteínas/química , Proteoma , Proteómica/métodos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Caenorhabditis elegans , Cristalografía por Rayos X , Escherichia coli/metabolismo , Modelos Estadísticos , Datos de Secuencia Molecular , Péptidos/química , Polímeros/química , Conformación Proteica , Pliegue de Proteína , Homología de Secuencia de Aminoácido
10.
J Biomol Struct Dyn ; 21(3): 327-39, 2003 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-14616029

RESUMEN

Recent sequence analysis of complete prokaryotic proteomes suggests that in early evolutionary stages proteins were rather small, of the size 25-35 amino acids. Corroborating evidence comes from protein crystal data, which indicate this size for closed loops--universal structural units of globular proteins. In the latest development we were able to derive and structurally characterize several sequence/structure prototypes apparently representing early protein units. Structurally the prototypes appear as closed loops stabilized by end-to-end van der Waals interactions. While nearly standard in size the loops are highly diverse in terms of their secondary structure. A presentation of the protein as an assembly of descendants of the prototypes, the first of its kind, is described in detail here. The sequence and structure of the ATP-binding subunit of histidine permease of S. typhimurium is shown to contain several modified copies of different prototype elements, closed loops, and, thus, can be spelled as: x-PI-x-PIV-PVI-PII-PVII-x, where PI-PVII are the prototype elements. This study sets up the basic principles for the sequence/structure prototype spelling of globular proteins.


Asunto(s)
Proteínas/química , Proteómica/métodos , Transportadoras de Casetes de Unión a ATP/química , Adenosina Trifosfato/química , Secuencia de Aminoácidos , Sistemas de Transporte de Aminoácidos Básicos/química , Proteínas Bacterianas/química , Cristalografía por Rayos X , Modelos Moleculares , Modelos Estadísticos , Datos de Secuencia Molecular , Conformación Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteoma , Salmonella typhimurium/enzimología
11.
Genet Sel Evol ; 35(5): 533-57, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-12939204

RESUMEN

In a project on the biodiversity of chickens funded by the European Commission (EC), eight laboratories collaborated to assess the genetic variation within and between 52 populations from a wide range of chicken types. Twenty-two di-nucleotide microsatellite markers were used to genotype DNA pools of 50 birds from each population. The polymorphism measures for the average, the least polymorphic population (inbred C line) and the most polymorphic population (Gallus gallus spadiceus) were, respectively, as follows: number of alleles per locus, per population: 3.5, 1.3 and 5.2; average gene diversity across markers: 0.47, 0.05 and 0.64; and proportion of polymorphic markers: 0.91, 0.25 and 1.0. These were in good agreement with the breeding history of the populations. For instance, unselected populations were found to be more polymorphic than selected breeds such as layers. Thus DNA pools are effective in the preliminary assessment of genetic variation of populations and markers. Mean genetic distance indicates the extent to which a given population shares its genetic diversity with that of the whole tested gene pool and is a useful criterion for conservation of diversity. The distribution of population-specific (private) alleles and the amount of genetic variation shared among populations supports the hypothesis that the red jungle fowl is the main progenitor of the domesticated chicken.


Asunto(s)
Pollos/genética , Variación Genética , Repeticiones de Microsatélite , Animales , Interpretación Estadística de Datos , Genética de Población , Mutación , Polimorfismo Genético
12.
Acta Biotheor ; 51(2): 73-89, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-12870770

RESUMEN

We introduce a novel, linguistic-like method of genome analysis. We propose a natural approach to characterizing genomic sequences based on occurrences of fixed length words from a predefined, sufficiently large set of words (strings over the alphabet [A, C, G, T]). A measure based on this approach is called compositional spectrum and is actually a histogram of imperfect word occurrences. Our results assert that the compositional spectrum is an overall characteristic of a long sequence i.e., a complete genome or an uninterrupted part of a chromosome. This attribute is manifested in the similarity of spectra obtained on different stretches of the same genome, and simultaneously in a broad range of dissimilarities between spectral representations of different genomes. High flexibility characterizes this approach due to imperfect matching and as a result sets of relatively long words can be considered. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.


Asunto(s)
ADN/genética , Genómica , Análisis de Secuencia de ADN/métodos , Estadística como Asunto/métodos , Algoritmos , Animales , Archaea/genética , Composición de Base , Secuencia de Bases , Cromosomas/genética , ADN/química , Eubacterium/genética , Células Eucariotas , Humanos , Lingüística , Modelos Genéticos
13.
J Theor Biol ; 221(4): 625-38, 2003 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-12713945

RESUMEN

A mathematical approach to interactions between genotypes and phenotypes in a multilocus multiallele population is developed. No a priori information on a fitness function is required. In particular, some structural definitions of epistasis and the position effect are given in terms of a decomposition of phenotypical structures. On this base a distance to the additive non-epistasis is introduced and an explicit formula for it is obtained. A class of phenotypical structures including multilocus dominance is described in terms of directed graphs. The evolutionary equations are adjusted to a fitness function compatible with a phenotypical structure. Some results on the finiteness of the equilibria set are presented.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Fenotipo , Selección Genética , Alelos , Animales , Evolución Molecular , Genotipo
14.
Genome ; 45(6): 1216-29, 2002 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-12502268

RESUMEN

Genetic diversity at 38 microsatellite (short sequence repeats (SSRs)) loci was studied in a sample of 54 plants representing a natural population of wild barley, Hordeum spontaneum, at the Neve Yaar microsite in Israel. Wild barley at the microsite was organized in a mosaic pattern over an area of 3180 m2 in the open Tabor oak forest, which was subdivided into four microniches: (i) sun-rock (11 genotypes), (ii) sun-soil (18 genotypes), (iii) shade-soil (11 genotypes), and (iv) shade-rock (14 genotypes). Fifty-four genotypes were tested for ecological-genetic microniche correlates. Analysis of 36 loci showed that allele distributions at SSR loci were nonrandom but structured by ecological stresses (climatic and edaphic). Sixteen (45.7%) of 35 polymorphic loci varied significantly (p < 0.05) in allele frequencies among the microniches. Significant genetic divergence and diversity were found among the four subpopulations. The soil and shade subpopulations showed higher genetic diversities at SSR loci than the rock and sun subpopulations, and the lowest genetic diversity was observed in the sun-rock subpopulation, in contrast with the previous allozyme and RAPD studies. On average, of 36 loci, 88.75% of the total genetic diversity exists within the four microniches, while 11.25% exists between the microniches. In a permutation test, G(ST) was lower for 4999 out of 5000 randomized data sets (p < 0.001) when compared with real data (0.1125). The highest genetic distance was between shade-soil and sun-rock (D = 0.222). Our results suggest that diversifying natural selection may act upon some regulatory regions, resulting in adaptive SSR divergence. Fixation of some loci (GMS61, GMS1, and EBMAC824) at a specific microniche seems to suggest directional selection. The pattern of other SSR loci suggests the operation of balancing selection. SSRs may be either direct targets of selection or markers of selected haplotypes (selective sweep).


Asunto(s)
Hordeum/genética , Repeticiones de Microsatélite/genética , Mosaicismo , Alelos , Evolución Biológica , Análisis por Conglomerados , Análisis Discriminante , Israel , Reacción en Cadena de la Polimerasa
15.
Protein Eng ; 15(12): 955-7, 2002 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-12601134

RESUMEN

It has recently been discovered that globular proteins are universally built from standard loop-n-lock units of about 30 amino acid residues. The hypothesis has been put forward on the loop stage in the protein evolution when the units were autonomous. Later they joined together making longer chains. One would expect that the early individual loop-n-lock elements might still be detected in modern protein sequences as remnants of the hypothetical 30-residue sequence prototypes. Among several strong sequence motifs, extracted from protein sequences of 23 complete bacterial proteomes, one 32-residue prototype was studied here in detail. Numerous sequence segments related to the prototype are identified in the crystal structures of proteins of a PDB_SELECT database. Analysis of the respective chain trajectories for the cases with different degrees of sequence conservation confirms that the majority of the segments correspond to the closed loops. In the evolutionary diversification of the prototypes the secondary structure yields first, while the sequence is still moderately conserved. The last feature to go is the chain return property. Apparently, the opening of the loops would severely destabilize the protein fold, which explains their conservation.


Asunto(s)
Proteínas/química , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Secuencia Conservada , Bases de Datos de Proteínas , Evolución Molecular , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Homología Estructural de Proteína
16.
Evolution ; 50(4): 1432-1441, 1996 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28565726

RESUMEN

The subject of this paper is polymorphism maintenance due to stabilizing selection with a moving optimum. It was shown that in case of two-locus additive control of the selected trait, global polymorphism is possible only when the geometric mean fitnesses of double homozygotes averaged over the period are lower than that of the single heterozygotes and of the double heterozygote (with a multiplier [1 - r]p , which depends on recombination rate r and period length p). But local stability of polymorphism cannot be excluded even if geometric mean fitnesses of all double homozygotes are higher than that of all heterozygotes. We proved, that for logarithmically convex fitness functions, cyclical changes of the optimum cannot help in polymorphism maintenance in case of additive control of the selected trait by two equal loci. However, within the same class of fitness functions, nonequal gene action and/or dominance effect for one or both loci may lead to local polymorphism stability with large enough polymorphism attracting domain. The higher the intensity of selection and closer the linkage between selected loci the larger is this domain. Note that even simple cyclical selection could result in two forms of polymorphic limiting behavior: (a) usually expected forced cycle with a period equal to that of environmental changes; and (b) "supercycles," nondumping auto-oscillations with a period comprising of hundreds of forced oscillation periods.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA