RESUMO
BACKGROUND: Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. RESULTS: In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .
RESUMO
The most common way to calculate the rearrangement distance between two genomes is to use the size of a minimum length sequence of rearrangements that transforms one of the two given genomes into the other, where the genomes are represented as permutations using only their gene order, based on the assumption that genomes have the same gene content. With the advance of research in genome rearrangements, new works extended the classical models by either considering genomes with different gene content (unbalanced genomes) or including more genomic characteristics to the mathematical representation of the genomes, such as the distribution of intergenic regions sizes. In this study, we study the Reversal, Transposition, and Indel (Insertion and Deletion) Distance using intergenic information, which allows comparing unbalanced genomes, because indels are included in the rearrangement model (i.e., the set of possible rearrangements allowed when we compute the distance). For the particular case of transpositions and indels on unbalanced genomes, we present a 4-approximation algorithm, improving a previous 4.5 approximation. This algorithm is extended so as to deal with gene orientation and to maintain the 4-approximation factor for the Reversal, Transposition, and Indel Distance on unbalanced genomes. Furthermore, we evaluate the proposed algorithms using experiments on simulated data.
Assuntos
Rearranjo Gênico , Modelos Genéticos , Genoma/genética , Genômica , Mutação INDEL , AlgoritmosRESUMO
Boana, the third largest genus of Hylinae, has cryptic morphological species. The potential applicability of b-ï¬brinogen intron 7 - FGBI7 is explored to propose a robust phylogeny of Boana. The phylogenetic potential of FGBI7 was evaluated using maximum parsimony, MrBayes, and maximum likelihood analysis. Comparison of polymorphic sites and topologies obtained with concatenated analysis of FGBI7 and other nuclear genes (CXCR4, CXCR4, RHO, SIAH1, TYR, and 28S) allowed evaluation of the phylogenetic signal of FGBI7. Mean evolutionary rates were calculated using the sequences of the mitochondrial genes ND1 and CYTB available for Boana in GenBank. Dating of Boana and some of its groups was performed using the RelTime method with secondary calibration. FGBI7 analysis revealed high values at informative sites for parsimony. The absolute values of the mean evolutionary rate were higher for mitochondrial genes than for FGBI7. Dating of congruent Boana groups for ND1, CYTB, and FGBI7 revealed closer values between mitochondrial genes and slightly different values from those of FGBI7. Divergence times of basal groups tended to be overestimated when mtDNA was used and were more accurate when nDNA was used. Although there is evidence of phylogenetic potential arising from concatenation of specific genes, FGBI7 provides well-resolved independent gene trees. These results lead to a paradigm for linking data in phylogenomics that focuses on the uniqueness of species histories and ignores the multiplicities of individual gene histories.
RESUMO
Double digest restriction-site associated DNA sequencing (ddRADseq) technology combines genome reduced representation by digestion with two restriction enzymes and next generation sequencing (NGS) to obtain thousands of markers (SNP, SSR, and InDels) and genotype tens to hundreds of samples simultaneously. In this chapter, we describe a 96-plex derived ddRADseq protocol that can be set up to obtain different depth of coverage per locus and can be exploited to model and non-model plant species.
Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Genótipo , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Tecnologia , Polimorfismo de Nucleotídeo ÚnicoRESUMO
A genotyping by sequencing (GBS) approach was used to analyze the organization of genetic diversity in V. pubescens and V. chilensis. GBS identified 4675 and 4451 SNPs/INDELs in two papaya species. The cultivated orchards of V. pubescens exhibited scarce genetic diversity and low but significant genetic differentiation. The neutrality test yielded a negative and significant result, suggesting that V. pubescens suffered a selective sweep or a rapid expansion after a bottleneck during domestication. In contrast, V. chilensis exhibited a high level of genetic diversity. The genetic differentiation among the populations was slight, but it was possible to distinguish the two genetic groups. The neutrality test indicated no evidence that natural selection and genetic drift affect the natural population of V. chilensis. Using the Carica papaya genome as a reference, we identified critical SNPs/INDELs associated with putative genes. Most of the identified genes are related to stress responses (salt and nematode) and vegetative and reproductive development. These results will be helpful for future breeding and conservation programs of the Caricaceae family.
RESUMO
In the comparative genomics field, one way to infer the evolutionary distance between two organisms of related species is by finding the minimum number of large-scale mutations, called genome rearrangements, that transform one genome into the other. This number is referred to as the rearrangement distance. Since problems in this area emerged in the mid-1990s, several genome rearrangements have been proposed. Rearrangements that do not alter the genome content are called conservative, and in this group we have the following: the reversal, which inverts a segment of the genome; the transposition, which exchanges two consecutive segments; and the double cut and join, which cuts two different pairs of adjacent blocks and joins them differently. Seminal works compared genomes sharing the same set of conserved blocks, but nowadays, researchers started looking at genomes with unequal gene content, by allowing the use of nonconservative rearrangements such as insertion and deletion (jointly called indel). The transposition distance and the transposition and indel distance are both NP-hard. We investigate the transposition and indel distance and present a structure called labeled cycle graph, representing an instance of rearrangement distance problems for genomes with unequal gene content. This structure is used to devise a lower bound and a 2-approximation algorithm for the transposition and indel distance.
Assuntos
Genoma , Mutação INDEL , Algoritmos , Rearranjo Gênico , Genômica , Modelos GenéticosRESUMO
Immigrants from diverse origins have arrived in Paraguay and produced important demographic changes in a territory initially inhabited by indigenous Guarani. Few studies have been performed to estimate the proportion of Native ancestry that is still preserved in Paraguay and the role of females and males in admixture processes. Therefore, 548 individuals from eastern Paraguay were genotyped for three marker sets: mtDNA, Y-SNPs and autosomal AIM-InDels. A genetic homogeneity was found between departments for each set of markers, supported by the demographic data collected, which showed that only 43% of the individuals have the same birthplace as their parents. The results show a sex-biased intermarriage, with higher maternal than paternal Native American ancestry. Within the native mtDNA lineages in Paraguay (87.2% of the total), most haplogroups have a broad distribution across the subcontinent, and only few are concentrated around the Paraná River basin. The frequency distribution of the European paternal lineages in Paraguay (92.2% of the total) showed a major contribution from the Iberian region. In addition to the remaining legacy of the colonial period, the joint analysis of the different types of markers included in this study revealed the impact of post-war migrations on the current genetic background of Paraguay.
Assuntos
Migração Humana , Linhagem , Polimorfismo de Nucleotídeo Único , População/genética , Cromossomos Humanos Y/genética , DNA Mitocondrial/genética , Evolução Molecular , Feminino , Humanos , Masculino , Repetições de Microssatélites , Paraguai , Grupos Raciais/genéticaRESUMO
Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.
Assuntos
Genoma , Modelos Genéticos , Algoritmos , Rearranjo Gênico , Genômica , HumanosRESUMO
Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.
Assuntos
Genoma , Modelos Genéticos , Algoritmos , DNA Intergênico/genética , Rearranjo Gênico , GenômicaRESUMO
OBJECTIVES: According to demographic history, Ecuador has experienced shifts in its Native American populations caused by European colonization and the African slave trade. The continuous admixture events among Europeans, Native Americans, and Africans occurred differently in each region of the country, producing a stratified population. Thus, the aim of this study was to investigate the level of genetic substructure in the Ecuadorian Mestizo population. MATERIALS AND METHODS: A total of 377 male and 209 female samples were genotyped for two sets of X-chromosomal markers (32 X-Indels and 12 X-STRs). Population analyses performed included Hardy-Weinberg equilibrium tests, LD analysis, PCA, pairwise FST s, and AMOVA. RESULTS: Significant levels of LD were observed between markers separated by distances of less than 1 cM, as well as between markers separated by distances varying from 10.891 to 163.53 cM. Among Ecuadorian regions, Amazonia showed the highest average R2 value. DISCUSSION: When X-chromosomal and autosomal differentiation values were compared, a sex-biased admixture between European men and Native American and African women was revealed, as well as between African men and Native American women. Moreover, a distinct Native American ancestry was discernible in the Amazonian population, in addition to sex-biased gene flow between Amazonia and the Andes and Pacific coast regions. Overall, these results underline the importance of integrating X chromosome information to achieve a more comprehensive view of the genetic and demographic histories of South American admixed populations.
Assuntos
Variação Genética/genética , Genética Populacional/métodos , Indígenas Sul-Americanos/genética , Antropologia Física , Cromossomos Humanos X/genética , Equador , Feminino , Humanos , Mutação INDEL/genética , Desequilíbrio de Ligação/genética , Masculino , Repetições de Microssatélites/genéticaRESUMO
The rearrangement distance is a well-known problem in the field of comparative genomics. Given two genomes, the rearrangement distance is the minimum number of rearrangements in a set of allowed rearrangements (rearrangement model), which transforms one genome into the other. In rearrangement distance problems, a genome is modeled as a string, where each element represents a conserved region within the two genomes. When the orientation of the genes is known, it is represented by (plus or minus) signs assigned to the elements of the string. Two of the most studied rearrangements are reversals, which invert a segment of the genome, and transpositions, which exchange the relative positions of two adjacent segments of the genome. The first works in genome rearrangements considered that the genomes being compared had the same genetic material and that rearrangement events were restricted to reversals, transpositions, or both. El-Mabrouk extended the reversal model on signed strings to include the operations of insertion and deletion of segments in the genome, which allowed the comparison of genomes with different genetic material. Other studies also addressed this problem and, recently, this problem was proved to be solvable in polynomial time by Willing et al. For unsigned strings, we still observe a lack of results. That said, in this study we prove that computing the rearrangement distance for the following models is NP-Hard: reversals and indels on unsigned strings; transpositions and indels on unsigned strings; and reversals, transpositions, and indels on signed and unsigned strings. Along with the NP-hardness proofs, we present a 2-approximation algorithm for reversals on unsigned strings and 3-approximation algorithms for the other models.
Assuntos
Rearranjo Gênico/genética , Genoma/genética , Mutação INDEL/genética , Algoritmos , Genômica/métodos , Modelos GenéticosRESUMO
BACKGROUND Evolutionary changes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) include indels in non-structural, structural, and accessory open reading frames (ORFs) or genes. OBJECTIVES We track indels in accessory ORFs to infer evolutionary gene patterns and epidemiological links between outbreaks. METHODS Genomes from Coronavirus disease 2019 (COVID-19) case-patients were Illumina sequenced using ARTIC_V3. The assembled genomes were analysed to detect substitutions and indels. FINDINGS We reported the emergence and spread of a unique 4-nucleotide deletion in the accessory ORF6, an interesting gene with immune modulation activity. The deletion in ORF6 removes one repeat unit of a two 4-nucleotide repeat, which shows that directly repeated sequences in the SARS-CoV-2 genome are associated with indels, even outside the context of extended repeat regions. The 4-nucleotide deletion produces a frameshifting change that results in a protein with two inserted amino acids, increasing the coding information of this accessory ORF. Epidemiological and genomic data indicate that the deletion variant has a single common ancestor and was initially detected in a health care outbreak and later in other COVID-19 cases, establishing a transmission cluster in the Uruguayan population. MAIN CONCLUSIONS Our findings provide evidence for the origin and spread of deletion variants and emphasise indels' importance in epidemiological studies, including differentiating consecutive outbreaks occurring in the same health facility.
RESUMO
Colombia, located in the north of the South American subcontinent is a country of great interest for population genetic studies given its high ethnic and cultural diversity represented by the admixed population, 102 indigenous peoples and African descent populations. In this study, an analysis of the genetic structure and ancestry was performed based on 46 ancestry informative INDEL markers (AIM-INDELs) and considering the genealogical and demographic variables of 451 unrelated individuals belonging to nine Native American, two African American, and four multiple ancestry populations. Measures of genetic diversity, ancestry components, and genetic substructure were analyzed to build a population model typical of the northernmost part of the South American continent. The model suggests three types of populations: Native American, African American, and multiple ancestry. The results support hypotheses posed by other authors about issues like the peopling of South America and the existence of two types of Native American ancestry. This last finding could be crucial for future research on the peopling of Colombia and South America in that a single origin of all indigenous communities should not be assumed. It then would be necessary to consider other events that could explain their genetic variability and complexity throughout the continent.
Assuntos
População Negra , Genética Populacional/métodos , Indígenas Sul-Americanos , Antropologia Física , População Negra/genética , População Negra/estatística & dados numéricos , Colômbia , Variação Genética/genética , Humanos , Mutação INDEL/genética , Indígenas Sul-Americanos/genética , Indígenas Sul-Americanos/estatística & dados numéricos , América do SulRESUMO
Fruit weight (FW) and shelf life (SL) are important traits in commercial fresh market tomatoes. A tomato RIL population was developed by antagonistic and divergent selection for both traits from an interspecific cross between the Solanum lycopersicum L. cv. "Caimanta" and the S. pimpinellifolium L. accession "LA0722". The objective of this work was to evaluate phenotypic and genetic components for FW and SL. Phenotypic data from RILs were collected during 3-year trials. Sixteen SSR, 62 InDels developed based on the genome sequences of "Caimanta" and "LA0722", and four functional markers for fruit size genes were used. FW and SL had a significant genetic variability, and both traits showed a genotype by year interaction. Genome-wide molecular characterization of the population demonstrated that is genetically structured according to FW. Marker data was used to study changes on allelic frequencies at loci between the phenotypic extreme group of RILs for FW and SL. Twenty four markers were associated to FW, the LC gene in chromosome 2 and other six markers in chromosomes 1, 2, 6, and 11 presented the most significant associations. Finally, we reported three new genomic regions located on chromosomes 9, 10 and 12 that underlie SL in tomato.
RESUMO
Atypical situations arise during the constant resolution of paternity cases, which constitute challenges requiring additional genetic systems and non-standard methods. We report a paternity case presenting three alleged father (AF)-child incompatibilities for the markers TPOX, D2S441, and the indel locus B02 (11/11 vs 8/8; 14/14 vs 10/10; 2/2 vs1/1, respectively). Considering the presence of mutations/null alleles, the residual paternity indexes (PI) obtained with 23 autosomal short tandem repeats (STRs) and 38 indels suggest that the AF is the father (PI = 1.94e+011). Although the presence of few incompatibilities also could imply paternity of the AF brother, this hypothesis was less probable (PI = 3.20e+9) (W = 98.4 vs 1.6%, respectively). The inclusion of 23 Y-STR loci confirmed the paternity relationship in this case (global PI = 6.08e+15). However, the two multistep STRs and one indel incompatibilities allow discarding the mutation possibility. On the other hand, the confirmation of the homozygous STR genotypes with two different human identification kits and the low probability to find three null alleles (3.10e-8) allow rejecting the null allele presence hypothesis. Conversely, the child's homozygous genotype for maternal alleles in four markers located in the p and q arms of the chromosome 2 (TPOX, D2S441, D2S1338, and B02) suggests that maternal uniparental isodisomy better explains the relationship despite the presence of three paternal incompatibilities. In brief, when multiple incompatibilities are observed in paternity testing, the chromosomal location of the excluding loci and the use of additional genetic systems can be crucial to get confident kinship conclusions.
Assuntos
Cromossomos Humanos Par 2/genética , Paternidade , Dissomia Uniparental , Cromossomos Humanos Y , Feminino , Genótipo , Homozigoto , Humanos , Mutação INDEL , Masculino , Repetições de MicrossatélitesRESUMO
BACKGROUND: Hereditary angioedema (HAE) is a rare genetic disorder mainly caused by mutations in the SERPING1 gene, determining a deficit of C1 inhibitor (C1-INH). In approximately 10% of the cases, HAE with C1-INH deficiency (C1-INH-HAE) is caused by large gene rearrangements, which are not detected by Sanger sequencing. Here we present the exon quantification technique (EQT), a molecular diagnostic test for the detection of large genetic rearrangements in SERPING1, mapping the exact size and location of the deletion caused by the recombination of Alu elements. EQT analysis was performed on total DNA extracted from blood of patients belonging to two Brazilian families with a medical history of HAE, low plasma levels of C4 and C1-INH and no pathogenic alteration in SERPING1 analyzed by Sanger sequencing. RESULTS: Two large deletions were found, one of 1356 pb and one of 1804 pb, which resulted from recombination of two Alu elements present in introns 3 and 4 of the gene. CONCLUSION: These results showed that the EQT could be used as a simple, rapid, and efficient diagnosis test for analysis of large deletions and insertions involving SERPING1, otherwise not detected by Sanger sequencing, serving as a support technique for molecular diagnosis of HAE.
Assuntos
Elementos Alu , Angioedemas Hereditários/genética , Mapeamento Cromossômico , Proteína Inibidora do Complemento C1/genética , Ordem dos Genes , Deleção de Sequência , Angioedemas Hereditários/sangue , Brasil , Complemento C4 , Éxons , Loci Gênicos , Humanos , ÍntronsRESUMO
Brucella canis, un patógeno intracelular facultativo, es responsable de la brucelosis canina, una enfermedad zoonótica que afecta a los caninos y al hombre. En los primeros causa abortos y fallas reproductivas; en el ser humano genera síntomas inespecíficos. En el año 2005 se demostró la presencia de B. canis en Antioquia (Colombia). Las cepas halladas se identificaron como tipo 2. La secuenciación del genoma completo de una cepa de campo denominada Brucella canis str. Oliveri mostró indels específicos de especie; a partir de estos se buscó conocer características genómicas de las cepas de B. canis aisladas y establecer relaciones filogenéticas, así como el tiempo de divergencia de la cepa Oliveri. Se realizó PCR convencional y secuenciación de 30 cepas de campo, se identificaron 5 indels reconocidos en B. canis str. Oliveri, se empleó ADN de Brucella suis, Brucella melitensis y cepas vacunales de Brucella abortus como controles. Se determinó que las cepas de campo estudiadas comparten 4 de los 5 indels de la cepa Oliveri, lo que indica la presencia de más de una cepa de B. canis circulando en la región. El análisis filogenético se realizó con 24 cepas de Brucella mediante secuencias concatenadas de genes marcadores de especie. Se probó la hipótesis del reloj molecular y adicionalmente se realizó test de tasas relativas de Tajima. De esta manera se demostró que la cepa Oliveri, al igual que las otras cepas de B. canis analizadas, divergen de B. suis. Se rechazó la hipótesis del reloj molecular entre las especies de Brucella y se demostró una tasa de evolución y una distancia genética similar entre las cepas de B. canis.
Brucella canis is a facultative intracellular pathogen responsible for canine brucellosis, a zoonotic disease that affects canines, causing abortions and reproductive failure; and the production of non-specific symptoms in humans. In 2005 the presence of B. canis in Antioquia was demonstrated and the strains were identified as type 2. The sequencing of the genome of a field strain denoted Brucella canis str. Oliveri, showed species-specific indel events, which led us to investigate the genomic characteristics of the B. canis strain isolated and to establish the phylogenetic relationships and the divergence time of B. canis str. Oliveri. Conventional PCR sequencing was performed in 30 field strains identifying 5 indel events recognized in B. canis str. Oliveri. ADN from Brucella suis, Brucella melitensis and vaccine strains from Brucella abortus were used as control, and it was determined that all of the studied field strains shared 4 out of the 5 indels of the sequenced Oliveri strain, indicating the presence of more than one strain circulating in the region. Phylogenetic analysis was performed with 24 strains of Brucella using concatenated sequences of genetic markers for species differentiation. The molecular clock hypothesis and Tajima's relative rate test were tested, showing that the Oliveri strain, similarly to other canis species, diverged from B. suis. The molecular clock hypothesis between Brucella species was rejected and an evolution rate and a similar genetic distance between the B. canis were demonstrated.
Assuntos
Animais , Cães , Feminino , Humanos , Gravidez , Filogenia , Variação Genética , Brucella canis , Brucella abortus , Brucelose/veterinária , Zoonoses , Brucella melitensis , Brucella canis/isolamento & purificação , Brucella canis/genéticaRESUMO
Alu insertions, INDELs, and SNPs in the X chromosome can be useful not only for revealing relationships among populations but also for identification purposes. We present data of 10 Alu insertions, 5 INDELs, and 15 SNPs of X-chromosome from three Argentinian north-east cities in order to gain insight into the genetic diversity of the X chromosome within this region of the country. Data from 198 unrelated individuals belonging to Posadas, Corrientes, and Eldorado cities were genotyped for Ya5DP62, Yb8DP49, Ya5DP3, Ya5NBC37, Ya5DP77, Ya5NBC491, Ya5DP4, Ya5DP13, Yb8NBC634, and Yb8NBC102 Alu insertions, for MID193, MID1705, MID3754, MID3756 and MID1540 Indels and for rs6639398, rs5986751, rs5964206, rs9781645, rs2209420, rs1299087, rs318173, rs933315, rs1991961, rs4825889, rs1781116, rs1937193, rs1781104, rs149910, and rs652 SNPs. No deviations from Hardy-Weinberg equilibrium were observed for Posadas and Corrientes. However, Eldorado showed significant values, and it was found to have an internal substructuring with two groups of different origin, one showing higher similarity with European countries, and the other with more similarities to Posadas and Corrientes. Fst pairwise genetic distances emerged for some markers among the studied populations and also between our data and those from other countries and continents. Of particular interest, Alu insertions demonstrated the most differences, and could be of use in ancestry studies for these populations, while INDELs and SNPs variation were informative for differentiation within the country.
RESUMO
Abstract Bacteria are important sources of cellulases with various industrial and biotechnological applications. In view of this, a non-hemolytic bacterial strain, tolerant to various environmental pollutants (heavy metals and organic solvents), showing high cellulolytic index (7.89) was isolated from cattle shed soil and identified as Bacillus sp. SV1 (99.27% pairwise similarity with Bacillus korlensis). Extracellular cellulases showed the presence of endoglucanase, total cellulase and β-glucosidase activities. Cellulase production was induced in presence of cellulose (3.3 times CMCase, 2.9 times FPase and 2.1 times β-glucosidase), and enhanced (115.1% CMCase) by low-cost corn steep solids. An in silico investigation of endoglucanase (EC 3.2.1.4) protein sequences of three Bacillus spp. as query, revealed their similarities with members of nine bacterial phyla and to Eukaryota (represented by Arthropoda and Nematoda), and also highlighted of a convergent and divergent evolution from other enzymes of different substrate [(1,3)-linked beta-d-glucans, xylan and chitosan] specificities. Characteristic conserved signature indels were observed among members of Actinobacteria (7 aa insert) and Firmicutes (9 aa insert) that served as a potential tool in support of their relatedness in phylogenetic trees.
Assuntos
Animais , Bovinos , Bacillus/enzimologia , Celulase/genética , Celulase/metabolismo , Evolução Molecular , Bacillus/crescimento & desenvolvimento , Bacillus/isolamento & purificação , Celulose/metabolismo , Biologia Computacional , Fezes/microbiologia , Regulação Bacteriana da Expressão Gênica , Regulação Enzimológica da Expressão Gênica , Mutação INDEL , Análise de Sequência de DNA , Homologia de Sequência , Especificidade por Substrato , Zea mays/metabolismoRESUMO
Bacteria are important sources of cellulases with various industrial and biotechnological applications. In view of this, a non-hemolytic bacterial strain, tolerant to various environmental pollutants (heavy metals and organic solvents), showing high cellulolytic index (7.89) was isolated from cattle shed soil and identified as Bacillus sp. SV1 (99.27% pairwise similarity with Bacillus korlensis). Extracellular cellulases showed the presence of endoglucanase, total cellulase and β-glucosidase activities. Cellulase production was induced in presence of cellulose (3.3 times CMCase, 2.9 times FPase and 2.1 times β-glucosidase), and enhanced (115.1% CMCase) by low-cost corn steep solids. An in silico investigation of endoglucanase (EC 3.2.1.4) protein sequences of three Bacillus spp. as query, revealed their similarities with members of nine bacterial phyla and to Eukaryota (represented by Arthropoda and Nematoda), and also highlighted of a convergent and divergent evolution from other enzymes of different substrate [(1,3)-linked beta-d-glucans, xylan and chitosan] specificities. Characteristic conserved signature indels were observed among members of Actinobacteria (7 aa insert) and Firmicutes (9 aa insert) that served as a potential tool in support of their relatedness in phylogenetic trees.(AU)