RESUMEN
The level of genetic diversity in a population is inversely proportional to the linkage disequilibrium (LD) between individual single nucleotide polymorphisms (SNPs) and quantitative trait loci (QTLs), leading to lower predictive ability of genomic breeding values (GEBVs) in high genetically diverse populations. Haplotype-based predictions could outperform individual SNP predictions by better capturing the LD between SNP and QTL. Therefore, we aimed to evaluate the accuracy and bias of individual-SNP- and haplotype-based genomic predictions under the single-step-genomic best linear unbiased prediction (ssGBLUP) approach in genetically diverse populations. We simulated purebred and composite sheep populations using literature parameters for moderate and low heritability traits. The haplotypes were created based on LD thresholds of 0.1, 0.3, and 0.6. Pseudo-SNPs from unique haplotype alleles were used to create the genomic relationship matrix ( G ) in the ssGBLUP analyses. Alternative scenarios were compared in which the pseudo-SNPs were combined with non-LD clustered SNPs, only pseudo-SNPs, or haplotypes fitted in a second G (two relationship matrices). The GEBV accuracies for the moderate heritability-trait scenarios fitting individual SNPs ranged from 0.41 to 0.55 and with haplotypes from 0.17 to 0.54 in the most (Ne â 450) and less (Ne < 200) genetically diverse populations, respectively, and the bias fitting individual SNPs or haplotypes ranged between -0.14 and -0.08 and from -0.62 to -0.08, respectively. For the low heritability-trait scenarios, the GEBV accuracies fitting individual SNPs ranged from 0.24 to 0.32, and for fitting haplotypes, it ranged from 0.11 to 0.32 in the more (Ne â 250) and less (Ne â 100) genetically diverse populations, respectively, and the bias ranged between -0.36 and -0.32 and from -0.78 to -0.33 fitting individual SNPs or haplotypes, respectively. The lowest accuracies and largest biases were observed fitting only pseudo-SNPs from blocks constructed with an LD threshold of 0.3 (p < 0.05), whereas the best results were obtained using only SNPs or the combination of independent SNPs and pseudo-SNPs in one or two G matrices, in both heritability levels and all populations regardless of the level of genetic diversity. In summary, haplotype-based models did not improve the performance of genomic predictions in genetically diverse populations.
RESUMEN
The agricultural and forestry productivity of Mediterranean ecosystems is strongly threatened by the adverse effects of climate change, including an increase in severe droughts and changes in rainfall distribution. In the present study, we performed a genome-wide association study (GWAS) to identify single-nucleotide polymorphisms (SNPs) and haplotype blocks associated with the growth and wood quality of Eucalyptus cladocalyx, a tree species suitable for low-rainfall sites. The study was conducted in a progeny-provenance trial established in an arid site with Mediterranean patterns located in the southern Atacama Desert, Chile. A total of 87 SNPs and 3 haplotype blocks were significantly associated with the 6 traits under study (tree height, diameter at breast height, slenderness coefficient, first bifurcation height, stem straightness, and pilodyn penetration). In addition, 11 loci were identified as pleiotropic through Bayesian multivariate regression and were mainly associated with wood hardness, height, and diameter. In general, the GWAS revealed associations with genes related to primary metabolism and biosynthesis of cell wall components. Additionally, associations coinciding with stress response genes, such as GEM-related 5 and prohibitin-3, were detected. The findings of this study provide valuable information regarding genetic control of morphological traits related to adaptation to arid environments.
RESUMEN
ABSTRACT: Empirical patterns of linkage disequilibrium (LD) can be used to increase the statistical power of genetic mapping. This study was carried out with the objective of verifying the efficacy of factor analysis (AF) applied to data sets of molecular markers of the SNP type, in order to identify linkage groups and haplotypes blocks. The SNPs data set used was derived from a simulation process of an F2 population, containing 2000 marks with information of 500 individuals. The estimation of the factorial loadings of FA was made in two ways, considering the matrix of distances between the markers (A) and considering the correlation matrix (R). The number of factors (k) to be used was established based on the graph scree-plot and based on the proportion of the total variance explained. Results indicated that matrices A and R lead to similar results. Based on the scree-plot we considered k equal to 10 and the factors interpreted as being representative of the bonding groups. The second criterion led to a number of factors equal to 50, and the factors interpreted as being representative of the haplotypes blocks. This showed the potential of the technique, making it possible to obtain results applicable to any type of population, helping or corroborating the interpretation of genomic studies. The study demonstrated that AF was able to identify patterns of association between markers, identifying subgroups of markers that reflect factor binding groups and also linkage disequilibrium groups.
RESUMO: Padrões empíricos de desequilíbrio de ligação (LD) podem ser utilizados para aumentar o poder estatístico do mapeamento genético. Este trabalho foi realizado com o objetivo de verificar a eficácia da análise de fatores (AF) aplicada a conjuntos de dados de marcadores moleculares do tipo SNP, visando identificar grupos de ligação e blocos de haplótipos. O conjunto de dados SNPs utilizado foi oriundo de um processo de simulação de uma população F2, contendo 2000 marcas com informações de 500 indivíduos. A estimação das cargas fatoriais (loadings) da AF foi feita de duas formas, considerando a matriz de distâncias entre os marcadores (A) e considerando a matriz de correlação (R). O número de fatores (k) a ser utilizado foi estabelecido com base no gráfico scree-plot e com base na proporção da variância total explicada. Os resultados indicam que as matrizes A e R conduzem a resultados similares. Com base no scree-plot considerou-se k igual a 10 e os fatores interpretados como sendo representativos dos grupos de ligação. O segundo critério conduziu a um número de fatores igual a 50, e os fatores interpretados como sendo representativos dos blocos de haplótipos. Isto mostra o potencial da técnica que permite obter resultados aplicáveis a qualquer tipo de população, corroborando a interpretação de estudos genômicos. O trabalho demonstrou que a AF foi capaz de identificar padrões de associação entre marcadores, identificando subgrupos de marcadores que refletem grupos de ligação fatorial e também grupos de desequilíbrio de ligação.
RESUMEN
Empirical patterns of linkage disequilibrium (LD) can be used to increase the statistical power of genetic mapping. This study was carried out with the objective of verifying the efficacy of factor analysis (AF) applied to data sets of molecular markers of the SNP type, in order to identify linkage groups and haplotypes blocks. The SNPs data set used was derived from a simulation process of an F2 population, containing 2000 marks with information of 500 individuals. The estimation of the factorial loadings of FA was made in two ways, considering the matrix of distances between the markers (A) and considering the correlation matrix (R). The number of factors (k) to be used was established based on the graph scree-plot and based on the proportion of the total variance explained. Results indicated that matrices A and R lead to similar results. Based on the scree-plot we considered k equal to 10 and the factors interpreted as being representative of the bonding groups. The second criterion led to a number of factors equal to 50, and the factors interpreted as being representative of the haplotypes blocks. This showed the potential of the technique, making it possible to obtain results applicable to any type of population, helping or corroborating the interpretation of genomic studies. The study demonstrated that AF was able to identify patterns of association between markers, identifying subgroups of markers that reflect factor binding groups and also linkage disequilibrium groups.(AU)
Padrões empíricos de desequilíbrio de ligação (LD) podem ser utilizados para aumentar o poder estatístico do mapeamento genético. Este trabalho foi realizado com o objetivo de verificar a eficácia da análise de fatores (AF) aplicada a conjuntos de dados de marcadores moleculares do tipo SNP, visando identificar grupos de ligação e blocos de haplótipos. O conjunto de dados SNPs utilizado foi oriundo de um processo de simulação de uma população F2, contendo 2000 marcas com informações de 500 indivíduos. A estimação das cargas fatoriais (loadings) da AF foi feita de duas formas, considerando a matriz de distâncias entre os marcadores (A) e considerando a matriz de correlação (R). O número de fatores (k) a ser utilizado foi estabelecido com base no gráfico scree-plot e com base na proporção da variância total explicada. Os resultados indicam que as matrizes A e R conduzem a resultados similares. Com base no scree-plot considerou-se k igual a 10 e os fatores interpretados como sendo representativos dos grupos de ligação. O segundo critério conduziu a um número de fatores igual a 50, e os fatores interpretados como sendo representativos dos blocos de haplótipos. Isto mostra o potencial da técnica que permite obter resultados aplicáveis a qualquer tipo de população, corroborando a interpretação de estudos genômicos. O trabalho demonstrou que a AF foi capaz de identificar padrões de associação entre marcadores, identificando subgrupos de marcadores que refletem grupos de ligação fatorial e também grupos de desequilíbrio de ligação.(AU)
Asunto(s)
Técnicas Genéticas , Marcadores GenéticosRESUMEN
We untangled key regions of the genetic architecture of grain yield (GY) in CIMMYT spring bread wheat by conducting a haplotype-based, genome-wide association study (GWAS), together with an investigation of epistatic interactions using seven large sets of elite yield trials (EYTs) consisting of a total of 6,461 advanced breeding lines. These lines were phenotyped under irrigated and stress environments in seven growing seasons (2011-2018) and genotyped with genotyping-by-sequencing markers. Genome-wide 519 haplotype blocks were constructed, using a linkage disequilibrium-based approach covering 14,036 Mb in the wheat genome. Haplotype-based GWAS identified 7, 4, 10, and 15 stable (significant in three or more EYTs) associations in irrigated (I), mild drought (MD), severe drought (SD), and heat stress (HS) testing environments, respectively. Considering all EYTs and the four testing environments together, 30 stable associations were deciphered with seven hotspots identified on chromosomes 1A, 1B, 2B, 4A, 5B, 6B, and 7B, where multiple haplotype blocks were associated with GY. Epistatic interactions contributed significantly to the genetic architecture of GY, explaining variation of 3.5-21.1%, 3.7-14.7%, 3.5-20.6%, and 4.4- 23.1% in I, MD, SD, and HS environments, respectively. Our results revealed the intricate genetic architecture of GY, controlled by both main and epistatic effects. The importance of these results for practical applications in the CIMMYT breeding program is discussed.
RESUMEN
Eucalyptus globulus (Labill.) is one of the most important cultivated eucalypts in temperate and subtropical regions and has been successfully subjected to intensive breeding. In this study, Bayesian genomic models that include the effects of haplotype and single nucleotide polymorphisms (SNP) were assessed to predict quantitative traits related to wood quality and tree growth in a 6-year-old breeding population. To this end, the following markers were considered: (a) ~14 K SNP markers (SNP), (b) ~3 K haplotypes (HAP), and (c) haplotypes and SNPs that were not assigned to a haplotype (HAP-SNP). Predictive ability values (PA) were dependent on the genomic prediction models and markers. On average, Bayesian ridge regression (BRR) and Bayes C had the highest PA for the majority of traits. Notably, genomic models that included the haplotype effect (either HAP or HAP-SNP) significantly increased the PA of low-heritability traits. For instance, BRR based on HAP had the highest PA (0.58) for stem straightness. Consistently, the heritability estimates from genomic models were higher than the pedigree-based estimates for these traits. The results provide additional perspectives for the implementation of genomic selection in Eucalyptus breeding programs, which could be especially beneficial for improving traits with low heritability.
RESUMEN
Researchers have made great advances into the development and application of genomic approaches for common beans, creating opportunities to driving more real and applicable strategies for sustainable management of the genetic resource towards plant breeding. This work provides useful polymorphic single-nucleotide polymorphisms (SNPs) for high-throughput common bean genotyping developed by RAD (restriction site-associated DNA) sequencing. The RAD tags were generated from DNA pooled from 12 common bean genotypes, including breeding lines of different gene pools and market classes. The aligned sequences identified 23,748 putative RAD-SNPs, of which 3357 were adequate for genotyping; 1032 RAD-SNPs with the highest ADT (assay design tool) score are presented in this article. The RAD-SNPs were structurally annotated in different coding (47.00 %) and non-coding (53.00 %) sequence components of genes. A subset of 384 RAD-SNPs with broad genome distribution was used to genotype a diverse panel of 95 common bean germplasms and revealed a successful amplification rate of 96.6 %, showing 73 % of polymorphic SNPs within the Andean group and 83 % in the Mesoamerican group. A slightly increased He (0.161, n = 21) value was estimated for the Andean gene pool, compared to the Mesoamerican group (0.156, n = 74). For the linkage disequilibrium (LD) analysis, from a group of 580 SNPs (289 RAD-SNPs and 291 BARC-SNPs) genotyped for the same set of genotypes, 70.2 % were in LD, decreasing to 0.10 %in the Andean group and 0.77 % in the Mesoamerican group. Haplotype patterns spanning 310 Mb of the genome (60 %) were characterized in samples from different origins. However, the haplotype frameworks were under-represented for the Andean (7.85 %) and Mesoamerican (5.55 %) gene pools separately. In conclusion, RAD sequencing allowed the discovery of hundreds of useful SNPs for broad genetic analysis of common bean germplasm. From now, this approach provides an excellent panel of molecular tools for whole genome analysis, allowing integrating and better exploring the common bean breeding practices.