Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 602
Filtrar
1.
Genome Biol ; 25(1): 253, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358801

RESUMO

In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (- 28.9%), INDELs (- 19.3%), and SVs (- 52.4%) across three datasets. vcfdist also corrects a common flaw in phasing evaluations, reducing measured flip errors by over 50%. Lastly, we show that vcfdist is more accurate than previously published works and on par with the newest approaches while providing improved result interpretability.


Assuntos
Benchmarking , Mutação INDEL , Polimorfismo de Nucleotídeo Único , Software , Humanos , Variação Estrutural do Genoma , Genoma Humano
2.
Sci Rep ; 14(1): 22774, 2024 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-39354004

RESUMO

While significant strides have been made in understanding pharmacogenetics (PGx) and gene-drug interactions, there remains limited characterization of population-level PGx variation. This study aims to comprehensively profile global star alleles (haplotype patterns) and phenotype frequencies in 58 pharmacogenes associated with drug absorption, distribution, metabolism, and excretion. PyPGx, a star-allele calling tool, was employed to identify star alleles within high-coverage whole genome sequencing (WGS) data from the 1000 Genomes Project (N = 2504; 26 global populations). This process involved detecting structural variants (SVs), such as gene deletions, duplications, hybrids, as well as single nucleotide variants and insertion-deletion variants. The majority of our PyPGx calls for star alleles and phenotype frequencies aligned with the Pharmacogenomics Knowledge Base, although notable population-specific frequencies differed at least twofold. Validation efforts confirmed known SVs while uncovering several novel SVs currently undefined as star alleles. Additionally, we identified 210 small nucleotide variants associated with severe functional consequences that are not defined as star alleles. The study serves as a valuable resource, providing updated population-level star allele and phenotype frequencies while incorporating SVs. It also highlights the burgeoning potential of cost-effective WGS for PGx genotyping, offering invaluable insights to improve tailored drug therapies across diverse populations.


Assuntos
Alelos , Farmacogenética , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Farmacogenética/métodos , Frequência do Gene , Polimorfismo de Nucleotídeo Único , Genoma Humano , Fenótipo , Haplótipos , Variação Estrutural do Genoma , Testes Farmacogenômicos/métodos , Projeto Genoma Humano
3.
Bioinformatics ; 40(Suppl 2): ii11-ii19, 2024 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-39230689

RESUMO

MOTIVATION: Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs. RESULTS: Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Análise de Sequência de DNA , Software , Humanos , Análise de Sequência de DNA/métodos , Genótipo , Técnicas de Genotipagem/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos
4.
Nat Commun ; 15(1): 8007, 2024 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-39266513

RESUMO

Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Hominidae , Humanos , Animais , Hominidae/genética , Genoma Humano/genética , Genômica/métodos , Haplótipos
5.
Genome Med ; 16(1): 113, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39300495

RESUMO

BACKGROUND: Structural variations (SVs) are key genetic contributors to neurodevelopmental disorders (NDDs). Exome sequencing (ES), the current first-line tool for genetic testing of NDDs, falls short in SVs detection. This diagnostic gap is being actively addressed by new methods such as optical genome mapping (OGM). METHODS: This study evaluated the utility of combining OGM and RNA-seq in the detection and interpretation of SVs in ES-negative NDDs. OGM was performed in 43 patients with NDDs with inconclusive ES results. Candidate SVs were selected based on disease association and pathogenicity evaluation, and further validated or reconstructed by alternative methods, including long-read sequencing for a complex rearrangement event. RNA-Seq was performed on blood samples from patients with candidate SVs to facilitate interpretation of pathogenicity. RESULTS: OGM detected four candidate SVs, and RNA-seq confirmed the pathogenicity of three SVs in the patient cohort. This combined approach solved three cases-two cases with de novo SVs in genes associated with autosomal dominant NDDs, including a deletion encompassing the promoter and 5'UTR of MBD5 and an intragenic duplication of PAFAH1B1, and a third case possessing an intragenic duplication in trans with a pathogenic single-nucleotide variant of PLA2G6, associated with autosomal recessive NDDs. The expression alteration of the affected genes and the tandem positioning of two intragenic duplications were confirmed by RNA-seq. In the fourth case, OGM detected a complex rearrangement involving chromosomes 2 and 6, much more complex than the de novo t(2:6)(q13;q15) indicated by conventional cytogenetic analysis. Reconstruction showed that 17 segments of 6q15 spanning 9.3 Mb were disarranged and joined 2q11.2, with four breakpoints detected in the 5' and 3' non-coding region of the NDD-associated gene SYNCRIP. RNA-seq revealed largely preserved SYNCRIP expression, leaving the pathogenicity of this complex rearrangement event uncertain. CONCLUSIONS: SVs in ES-negative NDDs can be identified by OGM, which is particularly useful for SVs in non-coding regions not covered by ES. OGM helps to construct complex SVs and provides information on the location and orientation of duplications, which is crucial for pathogenicity interpretation. The integration of RNA-seq facilitates the interpretation of the functional consequences of SVs at the transcriptional level. These findings demonstrate the utility and feasibility of combining OGM and RNA-seq in ES-negative cases with NDDs.


Assuntos
Mapeamento Cromossômico , Transtornos do Neurodesenvolvimento , RNA-Seq , Humanos , Transtornos do Neurodesenvolvimento/genética , Transtornos do Neurodesenvolvimento/diagnóstico , Masculino , Feminino , Criança , Sequenciamento do Exoma , Variação Estrutural do Genoma , Pré-Escolar
6.
BMC Genomics ; 25(1): 903, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39350025

RESUMO

BACKGROUND: Structural variants (SVs) such as deletions, duplications, and insertions are known to contribute to phenotypic variation but remain challenging to identify and genotype. A more complete, accessible, and assessable collection of SVs will assist efforts to study SV function in cattle and to incorporate SV genotyping into animal evaluation. RESULTS: In this work we produced a large and deeply characterized collection of SVs in Holstein cattle using two popular SV callers (Manta and Smoove) and publicly available Illumina whole-genome sequence (WGS) read sets from 310 samples (290 male, 20 female, mean 20X coverage). Manta and Smoove identified 31 K and 68 K SVs, respectively. In total the SVs cover 5% (Manta) and 6% (Smoove) of the reference genome, in contrast to the 1% impacted by SNPs and indels. SV genotypes from each caller were confirmed to accurately recapitulate animal relationships estimated using WGS SNP genotypes from the same dataset, with Manta genotypes outperforming Smoove, and deletions outperforming duplications. To support efforts to link the SVs to phenotypic variation, overlapping and tag SNPs were identified for each SV, using genotype sets extracted from the WGS results corresponding to two bovine SNP chips (BovineSNP50 and BovineHD). 9% (Manta) and 11% (Smoove) of the SVs were found to have overlapping BovineHD panel SNPs, while 21% (Manta) and 9% (Smoove) have BovineHD panel tag SNPs. A custom interactive database ( https://svdb-dc.pslab.ca ) containing the identified sequence variants with extensive annotations, gene feature information, and BAM file content for all SVs was created to enable the evaluation and prioritization of SVs for further study. Illustrative examples involving the genes POPDC3, ORM1, G2E3, FANCI, TFB1M, FOXC2, N4BP2, GSTA3, and COPA show how this resource can be used to find well-supported genic SVs, determine SV breakpoints, design genotyping approaches, and identify processed pseudogenes masquerading as deletions. CONCLUSIONS: The resources developed through this study can be used to explore sequence variation in Holstein cattle and to develop strategies for studying SVs of interest. The lack of overlapping and tag SNPs from commonly used SNP chips for most of the SVs suggests that other genotyping approaches will be needed (for example direct genotyping) to understand their potential contributions to phenotype. The included SV genotype assessments point to challenges in characterizing SVs, especially duplications, using short-read data and support ongoing efforts to better characterize cattle genomes through long-read sequencing. Lastly, the identification of previously known functional SVs and additional CDS-overlapping SVs supports the phenotypic relevance of this dataset.


Assuntos
Genótipo , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Feminino , Sequenciamento Completo do Genoma , Masculino , Variação Estrutural do Genoma , Bases de Dados Genéticas , Fenótipo , Genoma , Genômica/métodos
7.
BMC Bioinformatics ; 25(1): 315, 2024 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-39342151

RESUMO

BACKGROUND: Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT: In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS: FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .


Assuntos
Software , Humanos , Aprendizado Profundo , Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
8.
Bioinformatics ; 40(9)2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39240375

RESUMO

MOTIVATION: Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed. RESULTS: We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature. AVAILABILITY AND IMPLEMENTATION: https://github.com/xzyschumacher/CSV-Filter.


Assuntos
Aprendizado Profundo , Humanos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Variação Estrutural do Genoma
9.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39297879

RESUMO

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Software , Humanos , Sequenciamento Completo do Genoma/métodos , Algoritmos , Genômica/métodos , Biologia Computacional/métodos , Variação Genética
10.
J Bone Miner Res ; 39(10): 1474-1485, 2024 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-39167757

RESUMO

Osteoporosis, characterized by low BMD, is a highly heritable metabolic bone disorder. Although single nucleotide variations (SNVs) have been extensively studied, they explain only a fraction of BMD heritability. Although genomic structural variations (SVs) are large-scale genomic alterations that contribute to genetic diversity in shaping phenotypic variations, the role of SVs in osteoporosis susceptibility remains poorly understood. This study aims to identify and prioritize genes that harbor BMD-related SVs. We performed whole genome sequencing on 4982 subjects from the Louisiana Osteoporosis Study. To obtain high-confidence SVs, the detection of SVs was performed using an ensemble approach. The SVs were tested for association with BMD variation at the hip (HIP), femoral neck (FNK), and lumbar spine (SPN), respectively. Additionally, we conducted co-occurrence analysis using multi-omics approaches to prioritize the identified genes based on their functional importance. Stratification was employed to explore the sex- and ethnicity-specific effects. We identified significant SV-BMD associations: 125 for FNK-BMD, 99 for SPN-BMD, and 83 for HIP-BMD. We observed SVs that were commonly associated with both FNK and HIP BMDs in our combined and stratified analyses. These SVs explain 13.3% to 19.1% of BMD variation. Novel bone-related genes emerged, including LINC02370, ZNF family genes, and ZDHHC family genes. Additionally, FMN2, carrying BMD-related deletions, showed associations with FNK or HIP BMDs, with sex-specific effects. The co-occurrence analysis prioritized an RNA gene LINC00494 and ZNF family genes positively associated with BMDs at different skeletal sites. Two potential causal genes, IBSP and SPP1, for osteoporosis were also identified. Our study uncovers new insights into genetic factors influencing BMD through SV analysis. We highlight BMD-related SVs, revealing a mix of shared and specific genetic influences across skeletal sites and gender or ethnicity. These findings suggest potential roles in osteoporosis pathophysiology, opening avenues for further research and therapeutic targets.


Assuntos
Densidade Óssea , Osteoporose , Humanos , Densidade Óssea/genética , Osteoporose/genética , Feminino , Masculino , Louisiana/epidemiologia , Pessoa de Meia-Idade , Estudos de Coortes , Variação Estrutural do Genoma , Idoso , Etnicidade/genética , Adulto
11.
Curr Opin Genet Dev ; 88: 102240, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39121701

RESUMO

Advances in sequencing technologies have enabled the comparison of high-quality genomes of diverse primate species, revealing vast amounts of divergence due to structural variation. Given their large size, structural variants (SVs) can simultaneously alter the function and regulation of multiple genes. Studies estimate that collectively more than 3.5% of the genome is divergent in humans versus other great apes, impacting thousands of genes. Functional genomics and gene-editing tools in various model systems recently emerged as an exciting frontier - investigating the wide-ranging impacts of SVs on molecular, cellular, and systems-level phenotypes. This review examines existing research and identifies future directions to broaden our understanding of the functional roles of SVs on phenotypic innovations and diversity impacting uniquely human features, ranging from cognition to metabolic adaptations.


Assuntos
Evolução Molecular , Genoma Humano , Humanos , Animais , Genoma Humano/genética , Genômica , Variação Estrutural do Genoma/genética , Fenótipo , Hominidae/genética , Evolução Biológica , Edição de Genes
12.
Virulence ; 15(1): 2382762, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39092797

RESUMO

African swine fever (ASF) is a rapidly fatal viral haemorrhagic fever in Chinese domestic pigs. Although very high mortality is observed in pig farms after an ASF outbreak, clinically healthy and antibody-positive pigs are found in those farms, and viral detection is rare from these pigs. The ability of pigs to resist ASF viral infection may be modulated by host genetic variations. However, the genetic basis of the resistance of domestic pigs against ASF remains unclear. We generated a comprehensive set of structural variations (SVs) in a Chinese indigenous Xiang pig with ASF-resistant (Xiang-R) and ASF-susceptible (Xiang-S) phenotypes using whole-genome resequencing method. A total of 53,589 nonredundant SVs were identified, with an average of 25,656 SVs per individual in the Xiang pig genome, including insertion, deletion, inversion and duplication variations. The Xiang-R group harboured more SVs than the Xiang-S group. The F-statistics (FST) was carried out to reveal genetic differences between two populations using the resequencing data at each SV locus. We identified 2,414 population-stratified SVs and annotated 1,152 Ensembl genes (including 986 protein-coding genes), in which 1,326 SVs might disturb the structure and expression of the Ensembl genes. Those protein-coding genes were mainly enriched in the Wnt, Hippo, and calcium signalling pathways. Other important pathways associated with the ASF viral infection were also identified, such as the endocytosis, apoptosis, focal adhesion, Fc gamma R-mediated phagocytosis, junction, NOD-like receptor, PI3K-Akt, and c-type lectin receptor signalling pathways. Finally, we identified 135 candidate adaptive genes overlapping 166 SVs that were involved in the virus entry and virus-host cell interactions. The fact that some of population-stratified SVs regions detected as selective sweep signals gave another support for the genetic variations affecting pig resistance against ASF. The research indicates that SVs play an important role in the evolutionary processes of Xiang pig adaptation to ASF infection.


Assuntos
Vírus da Febre Suína Africana , Febre Suína Africana , Animais , Febre Suína Africana/virologia , Febre Suína Africana/genética , Suínos , Vírus da Febre Suína Africana/genética , Resistência à Doença/genética , Variação Genética , Genoma/genética , Sequenciamento Completo do Genoma , Variação Estrutural do Genoma , China , Sus scrofa
13.
Nat Commun ; 15(1): 6956, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138168

RESUMO

Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.


Assuntos
Diploide , Genoma Humano , Variação Estrutural do Genoma , Polimorfismo de Nucleotídeo Único , Humanos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Haplótipos
14.
Am J Hum Genet ; 111(8): 1524-1543, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39053458

RESUMO

Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.


Assuntos
Regulação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Variação Genética , Variação Estrutural do Genoma/genética , Transcriptoma/genética , Doadores de Sangue
15.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38980375

RESUMO

Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.


Assuntos
Algoritmos , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Variação Estrutural do Genoma , Software
16.
Genome Biol ; 25(1): 188, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39010145

RESUMO

BACKGROUND: Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS: This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS: This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Variação Estrutural do Genoma , Software , Análise de Sequência de DNA/métodos
17.
Curr Opin Genet Dev ; 87: 102233, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39042999

RESUMO

Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.


Assuntos
Genômica , Primatas , Telômero , Humanos , Animais , Primatas/genética , Telômero/genética , Genômica/métodos , Genoma Humano , Genoma/genética , Evolução Molecular , Variação Estrutural do Genoma/genética
18.
Genes (Basel) ; 15(7)2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39062704

RESUMO

The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.


Assuntos
Benchmarking , Sequenciamento por Nanoporos , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/normas , Sequenciamento por Nanoporos/métodos , Benchmarking/métodos , Variação Estrutural do Genoma/genética , Mapeamento Cromossômico/métodos , Genoma Humano/genética , Genômica/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Feminino , Nanoporos , Masculino , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
19.
Cell Genom ; 4(7): 100590, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38908378

RESUMO

The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.


Assuntos
Haplótipos , Humanos , Haplótipos/genética , Hibridização Genômica Comparativa , Variação Estrutural do Genoma/genética , Genoma Humano/genética , Duplicação Gênica/genética
20.
Methods Mol Biol ; 2825: 39-65, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38913302

RESUMO

Based on classical karyotyping, structural genome variations (SVs) have generally been considered to be either "simple" (with one or two breakpoints) or "complex" (with more than two breakpoints). Studying the breakpoints of SVs at nucleotide resolution revealed additional, subtle structural variations, such that even "simple" SVs turned out to be "complex." Genome-wide sequencing methods, such as fosmid and paired-end mapping, short-read and long-read whole genome sequencing, and single-molecule optical mapping, also indicated that the number of SVs per individual was considerably larger than expected from karyotyping and high-resolution chromosomal array-based studies. Interestingly, SVs were detected in studies of cohorts of individuals without clinical phenotypes. The common denominator of all SVs appears to be a failure to accurately repair DNA double-strand breaks (DSBs) or to halt cell cycle progression if DSBs persist. This review discusses the various DSB response mechanisms during the mitotic cell cycle and during meiosis and their regulation. Emphasis is given to the molecular mechanisms involved in the formation of translocations, deletions, duplications, and inversions during or shortly after meiosis I. Recently, CRISPR-Cas9 studies have provided unexpected insights into the formation of translocations and chromothripsis by both breakage-fusion-bridge and micronucleus-dependent mechanisms.


Assuntos
Quebras de DNA de Cadeia Dupla , Variação Estrutural do Genoma , Humanos , Meiose/genética , Cariotipagem/métodos , Sistemas CRISPR-Cas , Animais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA