Pesquisa | Portal Regional da BVS

Efficient gene orthology inference via large-scale rearrangements.

Rubert, Diego P; Braga, Marília D V.

Algorithms Mol Biol ; 18(1): 14, 2023 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-37770945

RESUMO

BACKGROUND: Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. RESULTS: In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .

The potential of family-free rearrangements towards gene orthology inference.

Rubert, Diego P; Doerr, Daniel; Braga, Marília D V.

J Bioinform Comput Biol ; 19(6): 2140014, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34775922

RESUMO

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.

Assuntos

Genoma , Modelos Genéticos , Algoritmos , Rearranjo Gênico , Genômica , Humanos

Computing the family-free DCJ similarity.

Rubert, Diego P; Hoshino, Edna A; Braga, Marília D V; Stoye, Jens; Martinez, Fábio V.

BMC Bioinformatics ; 19(Suppl 6): 152, 2018 05 08.

Artigo em Inglês | MEDLINE | ID: mdl-29745861

RESUMO

BACKGROUND: The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. RESULTS: We propose an exact ILP algorithm to solve the family-free DCJ similarity problem, then we show its APX-hardness and present four combinatorial heuristics with computational experiments comparing their results to the ILP. CONCLUSIONS: We show that the family-free DCJ similarity can be computed in reasonable time, although for larger genomes it is necessary to resort to heuristics. This provides a basis for further studies on the applicability and model refinement of family-free whole genome similarity measures.

Assuntos

Modelos Genéticos , Filogenia , Algoritmos , Animais , Simulação por Computador , Bases de Dados Genéticas , Genoma , Genômica , Heurística , Humanos , Camundongos , Ratos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA