Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
J Comput Biol ; 30(1): 41-51, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35575710

RESUMEN

The simplified partial digest problem (SPDP) models an effective and robust method for the building of a physical map using restriction site analysis. The best known algorithm requires O(n2n) time, using O(n2n) working space. The high complexities in time and space impede its application to genomes of a large number of sites. This article gives two new algorithms. The first improves the time by a factor of O(n) and significantly reduces the space to O(n2). The second improves both the time and space to O(n1.52n/2). Extensive experiments are conducted on real genomes. For instances that can be solved by the best known algorithm, the new algorithms achieve a speedup of up to 4000 times; in addition, due to the reduction in space, the new algorithms can solve many more instances. Experiments also reveal the following advantage of the SPDP method: almost every instance has at most four feasible solutions and for an instance that does not contain any pair of symmetric restriction sites, in all observed examples, the solution is unique.


Asunto(s)
Algoritmos , Genoma , Mapeo Cromosómico , Secuencia de Bases
2.
Artículo en Inglés | MEDLINE | ID: mdl-31217125

RESUMEN

The maximum agreement subtree method determines the consensus of a collection of phylogenetic trees by identifying maximum cardinality subsets of leaves for which all input trees agree. The trees induced by these maximum cardinality subsets are maximum agreement subtrees (MASTs). A single MAST may be misleading, since there can exist two MASTs which share almost no leaves; nevertheless, it may be impossible to inspect all MASTs, since the number of MASTs can be exponential in the number of leaves. To overcome this drawback, Swenson et al. suggested to further summarize the information common to all MASTs by their intersection, which is called the kernel agreement subtree (KAST). The construction of the KAST is the focus of this paper. Swenson et al. had an O(kn3 + n4 + nd + 1) time algorithm for computing the KAST of k trees on n leaves, in which at least one tree has maximum degree d. In this paper, an O(kn3 + nd)-time algorithm is presented. We demonstrate the efficiency of our algorithm on simulated trees as well as on ribosomal RNA alignments, where trees with 13,000 taxa took only hours to process, whereas the previous algorithm did not terminate after a week of computation.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Filogenia , Alineación de Secuencia/métodos , Secuencia de Consenso , Genes de ARNr/genética , Análisis de Secuencia de ARN/métodos
3.
Artículo en Inglés | MEDLINE | ID: mdl-29993953

RESUMEN

Tree comparison metrics are an important tool for the study of phylogenetic trees. Path-difference distances measure the dissimilarity between two phylogenetic trees (on the same set of taxa) by comparing their path-length vectors. Various norms can be applied to this distance. Three important examples are the $l_{1}\text{-},\;l_{2}\text{-}$l1-,l2-, and $l_{{\infty }}$l∞-norms. The previous best algorithms for computing path-difference distances all have $O(n^{2})$O(n2) running time. In this paper, we show how to compute the $l_{1}$l1-norm path-difference distance in $O(n\;{\log}^{2}\;n)$O(nlog2n) time and how to compute the $l_{2}$l2- and $l_{{\infty }}$l∞-norm path-difference distances in $O(n\;{\log}\;n)$O(nlogn) time. By extending the presented algorithms, we also show that the $l_{p}$lp-norm path-difference distance can be computed in $O(pn\;{\log}^{2}\;n)$O(pnlog2n) time for any positive integer $p$p. In addition, when the integer $p$p is even, we show that the distance can be computed in $O(p^{2}n\;{\log}\;n)$O(p2nlogn) time as well.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Filogenia
4.
Artículo en Inglés | MEDLINE | ID: mdl-27959819

RESUMEN

The focus of this paper is the frequent gene team problem. Given a quorum parameter µ and a set of m genomes, the problem is to find gene teams that occur in at least µ of the given genomes. In this paper, a new algorithm is presented. Previous solutions are efficient only when µ is small. Unlike previous solutions, the presented algorithm does not rely on examining every combination of µ genomes. Its time complexity is independent of µ. Under some realistic assumptions, the practical running time is estimated to be , where n is the maximum length of the input genomes. Experiments showed that the presented algorithm is extremely efficient. For any µ, it takes less than 1 second to process 100 bacterial genomes and takes only 10 minutes to process 2,000 genomes. The presented algorithm can be used as an effective tool for large scale genome analyses.


Asunto(s)
Algoritmos , Genómica/métodos , Genoma Bacteriano/genética , Modelos Genéticos , Familia de Multigenes/genética
5.
Artículo en Inglés | MEDLINE | ID: mdl-26353380

RESUMEN

The problem of finding all reversals that take a permutation one step closer to a target permutation is called the all sorting reversals problem (the ASR problem). For this problem, Siepel had an O(n (3))-time algorithm. Most complications of his algorithm stem from some peculiar structures called bad components. Since bad components are very rare in both real and simulated data, it is practical to study the ASR problem with no bad components. For the ASR problem with no bad components, Swenson et al. gave an O (n(2))-time algorithm. Very recently, Swenson found that their algorithm does not always work. In this paper, a new algorithm is presented for the ASR problem with no bad components. The time complexity is O(n(2)) in the worst case and is linear in the size of input and output in practice.


Asunto(s)
Algoritmos , Reordenamiento Génico/genética , Genoma/genética , Genómica/métodos
6.
Artículo en Inglés | MEDLINE | ID: mdl-26355514

RESUMEN

An important model of a conserved gene cluster is called the gene team model, in which a chromosome is defined to be a permutation of distinct genes and a gene team is defined to be a set of genes that appear in two or more species, with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. The previous fastest algorithm for constructing a gene team tree of two chromosomes requires O(n lg n lglg n) time, which was given by Wang and Lin. Its bottleneck is a problem called the maximum-gap problem. In this paper, by presenting an improved algorithm for the maximum-gap problem, we reduce the upper bound of the gene team tree problem to O(n lg n α(n)). Since α grows extremely slowly, this result is almost as efficient as the current best upper bound, O(n lg n), for finding the gene teams of a fixed δ value. Our new algorithm is very efficient from both the theoretical and practical points of view. Wang and Lin's gene-team-tree algorithm can be extended to k chromosomes with complexity O(kn lg n lglg n). Similarly, our improved algorithm for the maximum-gap problem reduces this running time to O(kn lg n α(n)). In addition, it also provides new upper bounds for the gene team tree problem on general sequences, in which multiple copies of the same gene are allowed.


Asunto(s)
Algoritmos , Genómica/métodos , Modelos Genéticos , Familia de Multigenes/genética
7.
Artículo en Inglés | MEDLINE | ID: mdl-22282907

RESUMEN

Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m ≤ n. Let C = Σ(α)(∈)(Σ) o(1)(α)o(2)(α), where Σ is the set of distinct genes, and o(1)(α) and o(2)(α) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O(min{C lg n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O(lg n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n(1)n(2). . .n(k)) time, where n(i) is the number of genes in the ith input sequence.


Asunto(s)
Algoritmos , Secuencia Conservada , Familia de Multigenes , Animales , Genoma , Humanos , Modelos Genéticos , Alineación de Secuencia
8.
Artículo en Inglés | MEDLINE | ID: mdl-21844635

RESUMEN

The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the free-inclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + N(out))-time algorithms, where N(out) denotes the size of the output. For the free-inclusion model, we give an O(n(1+ε) + N(out))-time algorithm, where ε > 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the free-inclusion models, we show that N(out) = O(n2). Let C = Σ(g∈Γ) o1(g)o2(g), where Γ is the set of distinct genes, and o1(g) and o2(g) are, respectively, the numbers of copies of gene g in the two given sequences. For the bijection model, we show that N(out) = O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(δn + N(out))-time algorithm is presented, where δ denotes the maximum number of allowed gaps. In addition, we show that for this problem N(out) is O(δn3).


Asunto(s)
Algoritmos , Genómica/métodos , Familia de Multigenes/genética , Análisis de Secuencia de ADN
9.
Artículo en Inglés | MEDLINE | ID: mdl-21116042

RESUMEN

A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Beal et al. had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg t) time, where t ≤ n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg n lglg n) time. Similar to Beal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k.


Asunto(s)
Algoritmos , Genómica/métodos , Modelos Genéticos , Familia de Multigenes/genética , Animales , Bases de Datos Genéticas , Genoma , Humanos , Ratones
10.
Artículo en Inglés | MEDLINE | ID: mdl-18451437

RESUMEN

Let A be a sequence of n real numbers, L(1) and L(2) be two integers such that L(1) < or = L(2) , and R(1) and R(2) be two real numbers such that R(1) < or = R(2). An interval of A is feasible if its length is between L(1) and L(2) and its average is between R(1) and R(2). In this paper, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands in biomolecular sequences. In this paper, we firstly show that all the problems have Omega (n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.


Asunto(s)
Algoritmos , ADN/química , ADN/genética , Composición de Base , Biología Computacional , Islas de CpG , Secuencia Rica en GC , Modelos Genéticos , Análisis de Secuencia de ADN/estadística & datos numéricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA