Búsqueda | Portal Regional de la BVS

Rubert, Diego P; Martinez, Fábio V; Braga, Marília D V.

Algorithms Mol Biol ; 16(1): 4, 2021 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-33971908

RESUMEN

BACKGROUND: A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410-431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances. RESULTS: In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants.

Rubert, Diego P; Martinez, Fábio V; Stoye, Jens; Doerr, Daniel.

BMC Genomics ; 21(Suppl 2): 273, 2020 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-32299356

RESUMEN

BACKGROUND: Computationally inferred ancestral genomes play an important role in many areas of genome research. We present an improved workflow for the reconstruction from highly diverged genomes such as those of plants. RESULTS: Our work relies on an established workflow in the reconstruction of ancestral plants, but improves several steps of this process. Instead of using gene annotations for inferring the genome content of the ancestral sequence, we identify genomic markers through a process called genome segmentation. This enables us to reconstruct the ancestral genome from hundreds of thousands of markers rather than the tens of thousands of annotated genes. We also introduce the concept of local genome rearrangement, through which we refine syntenic blocks before they are used in the reconstruction of contiguous ancestral regions. With the enhanced workflow at hand, we reconstruct the ancestral genome of eudicots, a major sub-clade of flowering plants, using whole genome sequences of five modern plants. CONCLUSIONS: Our reconstructed genome is highly detailed, yet its layout agrees well with that reported in Badouin et al. (2017). Using local genome rearrangement, not only the marker-based, but also the gene-based reconstruction of the eudicot ancestor exhibited increased genome content, evidencing the power of this novel concept.

Asunto(s)

Mapeo Cromosómico/métodos , Genómica/métodos , Magnoliopsida/genética , Simulación por Computador , Evolución Molecular , Orden Génico , Genoma de Planta , Modelos Genéticos , Filogenia , Sintenía/genética

Computing the family-free DCJ similarity.

Rubert, Diego P; Hoshino, Edna A; Braga, Marília D V; Stoye, Jens; Martinez, Fábio V.

BMC Bioinformatics ; 19(Suppl 6): 152, 2018 05 08.

Artículo en Inglés | MEDLINE | ID: mdl-29745861

RESUMEN

BACKGROUND: The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. RESULTS: We propose an exact ILP algorithm to solve the family-free DCJ similarity problem, then we show its APX-hardness and present four combinatorial heuristics with computational experiments comparing their results to the ILP. CONCLUSIONS: We show that the family-free DCJ similarity can be computed in reasonable time, although for larger genomes it is necessary to resort to heuristics. This provides a basis for further studies on the applicability and model refinement of family-free whole genome similarity measures.

Asunto(s)

Modelos Genéticos , Filogenia , Algoritmos , Animales , Simulación por Computador , Bases de Datos Genéticas , Genoma , Genómica , Heurística , Humanos , Ratones , Ratas

On the family-free DCJ distance and similarity.

Martinez, Fábio V; Feijão, Pedro; Braga, Marília Dv; Stoye, Jens.

Algorithms Mol Biol ; 10: 13, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25859276

RESUMEN

Structural variation in genomes can be revealed by many (dis)similarity measures. Rearrangement operations, such as the so called double-cut-and-join (DCJ), are large-scale mutations that can create complex changes and produce such variations in genomes. A basic task in comparative genomics is to find the rearrangement distance between two given genomes, i.e., the minimum number of rearragement operations that transform one given genome into another one. In a family-based setting, genes are grouped into gene families and efficient algorithms have already been presented to compute the DCJ distance between two given genomes. In this work we propose the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarities between genes. We prove that this new family-free DCJ distance problem is APX-hard and provide an integer linear program to its solution. We also study a family-free DCJ similarity and prove that its computation is NP-hard.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA