RESUMO
Genome rearrangement events are widely used to estimate a minimum-size sequence of mutations capable of transforming a genome into another. The length of this sequence is called distance, and determining it is the main goal in genome rearrangement distance problems. Problems in the genome rearrangement field differ regarding the set of rearrangement events allowed and the genome representation. In this work, we consider the scenario where the genomes share the same set of genes, gene orientation is known or unknown, and intergenic regions (structures between a pair of genes and at the extremities of the genome) are taken into account. We use two models, the first model allows only conservative events (reversals and moves), and the second model includes non-conservative events (insertions and deletions) in the intergenic regions. We show that both models result in NP-hard problems no matter if gene orientation is known or unknown. When the information regarding the orientation of genes is available, we present for both models an approximation algorithm with a factor of 2. For the scenario where this information is unavailable, we propose a 4-approximation algorithm for both models.
Assuntos
Rearranjo Gênico , Modelos Genéticos , DNA Intergênico/genética , Genoma , Mutação , AlgoritmosRESUMO
In the field of comparative genomics, one way of comparing two genomes is through the analysis of how they distinguish themselves based on a set of mutations called rearrangement events. When considering that genomes undergo different types of rearrangements, it can be assumed that some events are more common than others. To model this assumption, one can assign different weights to different events, where common events tend to cost less than others. However, this approach, called weighted, does not guarantee that the rearrangement assumed to be the most frequent will be also the most frequently returned by proposed algorithms. To overcome this issue, we investigate a new problem where we seek the shortest sequence of rearrangement events able to transform one genome into the other, with a restriction regarding the proportion between the events returned. Here, we consider two rearrangement events: reversal, that inverts the order and the orientation of the genes inside a segment of the genome, and transposition, that moves a segment of the genome to another position. We show the complexity of this problem for any desired proportion considering scenarios where the orientation of the genes is known or unknown. We also develop an approximation algorithm with a constant approximation factor for each scenario and, in particular, we describe an improved (asymptotic) approximation algorithm for the case where the gene orientation is known. At last, we present the experimental tests comparing the proposed algorithms with others from the literature for the version of the problem without the proportion restriction.
Assuntos
Genoma , Genômica , Algoritmos , Rearranjo Gênico , Modelos Genéticos , MutaçãoRESUMO
During the evolutionary process, genomes are affected by various genome rearrangements, that is, events that modify large stretches of the genetic material. In the literature, a large number of models have been proposed to estimate the number of events that occurred during evolution; most of them represent a genome as an ordered sequence of genes, and, in particular, disregard the genetic material between consecutive genes. However, recent studies showed that taking into account the genetic material between consecutive genes can enhance evolutionary distance estimations. Reversal and transposition are genome rearrangements that have been widely studied in the literature. A reversal inverts a (contiguous) segment of the genome, while a transposition swaps the positions of two consecutive segments. Genomes also undergo nonconservative events (events that alter the amount of genetic material) such as insertions and deletions, in which genetic material from intergenic regions of the genome is inserted or deleted, respectively. In this article, we study a genome rearrangement model that considers both gene order and sizes of intergenic regions. We investigate the reversal distance, and also the reversal and transposition distance between two genomes in two scenarios: with and without nonconservative events. We show that these problems are NP-hard and we present constant ratio approximation algorithms for all of them. More precisely, we provide a 4-approximation algorithm for the reversal distance, both in the conservative and nonconservative versions. For the reversal and transposition distance, we provide a 4.5-approximation algorithm, both in the conservative and nonconservative versions. We also perform experimental tests to verify the behavior of our algorithms, as well as to compare the practical and theoretical results. We finally extend our study to scenarios in which events have different costs, and we present constant ratio approximation algorithms for each scenario.
RESUMO
In comparative genomics, rearrangements are mutations that affect a stretch of DNA sequences. Reversals and transpositions are well-known rearrangements, and each has a vast literature. The reversal and transposition distance, that is, the minimum number of reversals and transpositions needed to transform one genome into another is a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations was proposed >20 years ago and received the name of sorting by reversals and transpositions problem. It has been the focus of a number of studies, but the computational complexity has remained open until now. We hereby solve this question and prove that it is NP-hard no matter whether genomes are represented by signed or unsigned permutations. In addition, we prove that a usual generalization of this problem, which assigns weights wρ for reversals and wτ for transpositions, is also NP-hard as long as wτ/wρ ≤ 1.5 for both signed and unsigned permutations.