RESUMO
Meiotic recombination is a crucial cellular process, being one of the major drivers of evolution and adaptation of species. In plant breeding, crossing is used to introduce genetic variation among individuals and populations. While different approaches to predict recombination rates for different species have been developed, they fail to estimate the outcome of crossings between two specific accessions. This paper builds on the hypothesis that chromosomal recombination correlates positively to a measure of sequence identity. It presents a model that uses sequence identity, combined with other features derived from a genome alignment (including the number of variants, inversions, absent bases, and CentO sequences) to predict local chromosomal recombination in rice. Model performance is validated in an inter-subspecific indica x japonica cross, using 212 recombinant inbred lines. Across chromosomes, an average correlation of about 0.8 between experimental and prediction rates is achieved. The proposed model, a characterization of the variation of the recombination rates along the chromosomes, can enable breeding programs to increase the chances of creating novel allele combinations and, more generally, to introduce new varieties with a collection of desirable traits. It can be part of a modern panel of tools that breeders can use to reduce costs and execution times of crossing experiments.
Assuntos
Oryza , Melhoramento Vegetal , Humanos , Genoma , Cromossomos/genética , Recombinação Homóloga , Fenótipo , Oryza/genéticaRESUMO
Functional enrichment analysis is a cornerstone in bioinformatics as it makes possible to identify functional information by using a gene list as source. Different tools are available to compare gene ontology (GO) terms, based on a directed acyclic graph structure or content-based algorithms which are time-consuming and require a priori information of GO terms. Nevertheless, quantitative procedures to compare GO terms among gene lists and species are not available. Here we present a computational procedure, implemented in R, to infer functional information derived from comparative strategies. GOCompare provides a framework for functional comparative genomics starting from comparable lists from GO terms. The program uses functional enrichment analysis (FEA) results and implement graph theory to identify statistically relevant GO terms for both, GO categories and analyzed species. Thus, GOCompare allows finding new functional information complementing current FEA approaches and extending their use to a comparative perspective. To test our approach GO terms were obtained for a list of aluminum tolerance-associated genes in Oryza sativa subsp. japonica and their orthologues in Arabidopsis thaliana. GOCompare was able to detect functional similarities for reactive oxygen species and ion binding capabilities which are common in plants as molecular mechanisms to tolerate aluminum toxicity. Consequently, the R package exhibited a good performance when implemented in complex datasets, allowing to establish hypothesis that might explain a biological process from a functional perspective, and narrowing down the possible landscapes to design wet lab experiments.
Assuntos
Alumínio , Arabidopsis , Genômica/métodos , Biologia Computacional/métodos , Algoritmos , Ontologia Genética , Arabidopsis/genéticaRESUMO
BACKGROUND: This paper proposes a workflow to identify genes that respond to specific treatments in plants. The workflow takes as input the RNA sequencing read counts and phenotypical data of different genotypes, measured under control and treatment conditions. It outputs a reduced group of genes marked as relevant for treatment response. Technically, the proposed approach is both a generalization and an extension of WGCNA. It aims to identify specific modules of overlapping communities underlying the co-expression network of genes. Module detection is achieved by using Hierarchical Link Clustering. The overlapping nature of the systems' regulatory domains that generate co-expression can be identified by such modules. LASSO regression is employed to analyze phenotypic responses of modules to treatment. RESULTS: The workflow is applied to rice (Oryza sativa), a major food source known to be highly sensitive to salt stress. The workflow identifies 19 rice genes that seem relevant in the response to salt stress. They are distributed across 6 modules: 3 modules, each grouping together 3 genes, are associated to shoot K content; 2 modules of 3 genes are associated to shoot biomass; and 1 module of 4 genes is associated to root biomass. These genes represent target genes for the improvement of salinity tolerance in rice. CONCLUSIONS: A more effective framework to reduce the search-space for target genes that respond to a specific treatment is introduced. It facilitates experimental validation by restraining efforts to a smaller subset of genes of high potential relevance.