Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 144
Filtrar
1.
Methods Mol Biol ; 2846: 263-283, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39141241

RESUMEN

Chromatin endogenous cleavage coupled with high-throughput sequencing (ChEC-seq) is a profiling method for protein-DNA interactions that can detect binding locations in vivo, does not require antibodies or fixation, and provides genome-wide coverage at near nucleotide resolution.The core of this method is an MNase fusion of the target protein, which allows it, when triggered by calcium exposure, to cut DNA at its binding sites and to generate small DNA fragments that can be readily separated from the rest of the genome and sequenced.Improvements since the original protocol have increased the ease, lowered the costs, and multiplied the throughput of this method to enable a scale and resolution of experiments not available with traditional methods such as ChIP-seq. This method describes each step from the initial creation and verification of the MNase-tagged yeast strains, over the ChEC MNase activation and small fragment purification procedure to the sequencing library preparation. It also briefly touches on the bioinformatic steps necessary to create meaningful genome-wide binding profiles.


Asunto(s)
Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cromatina/genética , Cromatina/metabolismo , Sitios de Unión , Análisis de Secuencia de ADN/métodos , Nucleasa Microcócica/metabolismo , Nucleasa Microcócica/genética , Biología Computacional/métodos
2.
mSystems ; 9(9): e0016024, 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39105591

RESUMEN

As antimicrobial resistance (AMR) surveillance shifts to genomics, ensuring the quality of whole-genome sequencing (WGS) data produced across laboratories is critical. Participation in genomic proficiency tests (GPTs) not only increases individual laboratories' WGS capacity but also provides a unique opportunity to improve species-specific thresholds for WGS quality control (QC) by repeated resequencing of distinct isolates. Here, we present the results of the EU Reference Laboratory for Antimicrobial Resistance (EURL-AR) network GPTs of 2021 and 2022, which included 25 EU national reference laboratories (NLRs). A total of 392 genomes from 12 AMR-bacteria were evaluated based on WGS QC metrics. Two percent (n = 9) of the data were excluded, due to contamination, and 11% (n = 41) of the remaining genomes were identified as outliers in at least one QC metric and excluded from computation of the adjusted QC thresholds (AQT). Two QC metric correlation groups were identified through linear regression. Eight percent (n = 28) of the submitted genomes, from 11 laboratories, failed one or more of the AQTs. However, only three laboratories (12%) were identified as underperformers, failing across AQTs for uncorrelated QC metrics in at least two genomes. Finally, new species-specific thresholds for "N50" and "number of contigs > 200 bp" are presented for guidance in routine laboratory QC. The continued participation of NRLs in GPTs will reveal WGS workflow flaws and improve AMR surveillance data. GPT data will continue to contribute to the development of reliable species-specific thresholds for routine WGS QC, standardizing sequencing data QC and ensure inter- and intranational laboratory comparability.IMPORTANCEIllumina next-generation sequencing is an integral part of antimicrobial resistance (AMR) surveillance and the most widely used whole-genome sequencing (WGS) platform. The high-throughput, relative low-cost, high discriminatory power, and rapid turnaround time of WGS compared to classical biochemical methods means the technology will likely remain a fundamental tool in AMR surveillance and public health. In this study, we present the current level of WGS capacity among national reference laboratories in the EU Reference Laboratory for AMR network, summarizing applied methodology and statistically evaluating the quality of the obtained sequence data. These findings provide the basis for setting new and revised thresholds for quality metrics used in routine WGS, which have previously been arbitrarily defined. In addition, underperforming participants are identified and encouraged to evaluate their workflows to produce reliable results.


Asunto(s)
Farmacorresistencia Bacteriana , Unión Europea , Genoma Bacteriano , Ensayos de Aptitud de Laboratorios , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/normas , Farmacorresistencia Bacteriana/genética , Humanos , Control de Calidad , Bacterias/genética , Bacterias/efectos de los fármacos , Antibacterianos/farmacología , Laboratorios/normas , Especificidad de la Especie
3.
Microbiol Res ; 288: 127867, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39163716

RESUMEN

BACKGROUND: Enterobacter species are included among the normal human gut microflora and persist in a diverse range of other environmental niches. They have become important opportunistic nosocomial pathogens known to harbour plasmid-mediated multi-class antimicrobial resistance (AMR) determinants. Global AMR surveillance of Enterobacterales isolates shows the genus is second to Klebsiella in terms of frequency of carbapenem resistance. Enterobacter taxonomy is confusing and standard species identification methods are largely inaccurate or insufficient. There are currently 27 named species and a total of 46 taxa in the genus distinguishable via average nucleotide identity (ANI) calculation between pairs of genomic sequences. Here we describe an Enterobacter strain, ECC3473, isolated from the wastewater of an Australian hospital whose species could not be determined by standard methods nor by ribosomal RNA gene multi-locus typing. AIM: To characterise ECC3473 in terms of phenotypic and genotypic antimicrobial resistance, biochemical characteristics and taxonomy as well as to determine the global distribution of the novel species to which it belongs. METHODS: Standard broth dilution and disk diffusion were used to determine phenotypic AMR. The strain's complete genome, including plasmids, was obtained following long- and short read sequencing and a novel long/short read hybrid assembly and polishing, and the genomic basis of AMR was determined. Phylogenomic analysis and quantitative measures of relatedness (ANI, digital DNA-DNA hybridisation, and difference in G+C content) were used to study the taxonomic relationship between ECC3473 and Enterobacter type-strains. NCBI and PubMLST databases and the literature were searched for additional members of the novel species to determine its global distribution. RESULTS: ECC3473 is one of 21 strains isolated globally belonging to a novel Enterobacter species for which the name, Enterobacter adelaidei sp. nov. is proposed. The novel species was found to be resilient in its capacity to persist in contaminated water and adaptable in its ability to accumulate multiple transmissible AMR determinants. CONCLUSION: E. adelaidei sp. nov. may become increasingly important to the dissemination of AMR.


Asunto(s)
Farmacorresistencia Bacteriana Múltiple , Enterobacter , Genoma Bacteriano , Hospitales , Filogenia , Aguas Residuales , Aguas Residuales/microbiología , Enterobacter/genética , Enterobacter/aislamiento & purificación , Enterobacter/clasificación , Enterobacter/efectos de los fármacos , Australia , Farmacorresistencia Bacteriana Múltiple/genética , Humanos , Antibacterianos/farmacología , Plásmidos/genética , Pruebas de Sensibilidad Microbiana , Tipificación de Secuencias Multilocus , ARN Ribosómico 16S/genética , ADN Bacteriano/genética
4.
Front Vet Sci ; 11: 1443855, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39144078

RESUMEN

Introduction: Spillover events of Mycoplasma ovipneumoniae have devastating effects on the wild sheep populations. Multilocus sequence typing (MLST) is used to monitor spillover events and the spread of M. ovipneumoniae between the sheep populations. Most studies involving the typing of M. ovipneumoniae have used Sanger sequencing. However, this technology is time-consuming, expensive, and is not well suited to efficient batch sample processing. Methods: Our study aimed to develop and validate an MLST workflow for typing of M. ovipneumoniae using Nanopore Rapid Barcoding sequencing and multiplex polymerase chain reaction (PCR). We compare the workflow with Nanopore Native Barcoding library preparation and Illumina MiSeq amplicon protocols to determine the most accurate and cost-effective method for sequencing multiplex amplicons. A multiplex PCR was optimized for four housekeeping genes of M. ovipneumoniae using archived DNA samples (N = 68) from nasal swabs. Results: Sequences recovered from Nanopore Rapid Barcoding correctly identified all MLST types with the shortest total workflow time and lowest cost per sample when compared with Nanopore Native Barcoding and Illumina MiSeq methods. Discussion: Our proposed workflow is a convenient and effective method for strain typing of M. ovipneumoniae and can be applied to other bacterial MLST schemes. The workflow is suitable for diagnostic settings, where reduced hands-on time, cost, and multiplexing capabilities are important.

5.
Methods Mol Biol ; 2833: 161-183, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38949710

RESUMEN

Outbreaks are a risk to public health particularly when pathogenic, hypervirulent, and/or multidrug-resistant organisms (MDROs) are involved. In a hospital setting, vulnerable populations such as the immunosuppressed, intensive care patients, and neonates are most at risk. Rapid and accurate outbreak detection is essential to implement effective interventions in clinical areas to control and stop further transmission. Advances in the field of whole genome sequencing (WGS) have resulted in lowered costs, increased capacity, and improved reproducibility of results. WGS now has the potential to revolutionize the investigation and management of outbreaks replacing conventional genotyping and other discrimination systems. Here, we outline specific procedures and protocols to implement WGS into investigation of outbreaks in healthcare settings.


Asunto(s)
Brotes de Enfermedades , Genómica , Secuenciación Completa del Genoma , Humanos , Secuenciación Completa del Genoma/métodos , Genómica/métodos , Genoma Bacteriano
6.
Genes (Basel) ; 15(7)2024 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-39062704

RESUMEN

The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.


Asunto(s)
Benchmarking , Secuenciación de Nanoporos , Secuenciación Completa del Genoma , Humanos , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/normas , Secuenciación de Nanoporos/métodos , Benchmarking/métodos , Variación Estructural del Genoma/genética , Mapeo Cromosómico/métodos , Genoma Humano/genética , Genómica/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Femenino , Nanoporos , Masculino , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas
7.
PeerJ ; 12: e17748, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39076774

RESUMEN

Background: Tandem duplication (TD) is a common and important type of structural variation in the human genome. TDs have been shown to play an essential role in many diseases, including cancer. However, it is difficult to accurately detect TDs due to the uneven distribution of reads and the inherent complexity of next-generation sequencing (NGS) data. Methods: This article proposes a method called DTDHM (detection of tandem duplications based on hybrid methods), which utilizes NGS data to detect TDs in a single sample. DTDHM builds a pipeline that integrates read depth (RD), split read (SR), and paired-end mapping (PEM) signals. To solve the problem of uneven distribution of normal and abnormal samples, DTDHM uses the K-nearest neighbor (KNN) algorithm for multi-feature classification prediction. Then, the qualified split reads and discordant reads are extracted and analyzed to achieve accurate localization of variation sites. This article compares DTDHM with three other methods on 450 simulated datasets and five real datasets. Results: In 450 simulated data samples, DTDHM consistently maintained the highest F1-score. The average F1-score of DTDHM, SVIM, TARDIS, and TIDDIT were 80.0%, 56.2%, 43.4%, and 67.1%, respectively. The F1-score of DTDHM had a small variation range and its detection effect was the most stable and 1.2 times that of the suboptimal method. Most of the boundary biases of DTDHM fluctuated around 20 bp, and its boundary deviation detection ability was better than TARDIS and TIDDIT. In real data experiments, five real sequencing samples (NA19238, NA19239, NA19240, HG00266, and NA12891) were used to test DTDHM. The results showed that DTDHM had the highest overlap density score (ODS) and F1-score of the four methods. Conclusions: Compared with the other three methods, DTDHM achieved excellent results in terms of sensitivity, precision, F1-score, and boundary bias. These results indicate that DTDHM can be used as a reliable tool for detecting TDs from NGS data, especially in the case of low coverage depth and tumor purity samples.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Genoma Humano/genética , Secuencias Repetidas en Tándem/genética
8.
Methods Mol Biol ; 2822: 245-262, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38907923

RESUMEN

RNA sequencing (RNA-Seq) has emerged as a powerful and versatile tool for the comprehensive analysis of transcriptomes and has been widely used to investigate gene expression, copy number variation, alternative splicing, and novel transcript discovery. This chapter outlines the methodology for conducting short-read RNA-Seq, starting from RNA enrichment to library preparation and sequencing. Throughout the chapter, practical tips and best practices are provided to guide researchers in order to optimize each step of the RNA-Seq workflow. Multiple quality control steps throughout the workflow that are critical to obtain high-quality RNA-Seq data are also discussed.


Asunto(s)
RNA-Seq , Humanos , RNA-Seq/métodos , Perfilación de la Expresión Génica/métodos , Transcriptoma/genética , Análisis de Secuencia de ARN/métodos , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Control de Calidad , ARN/genética , Flujo de Trabajo , Programas Informáticos , Empalme Alternativo/genética , Biología Computacional/métodos
9.
bioRxiv ; 2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38826378

RESUMEN

The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short read de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target-capture short read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short read data. MHConstructor facilitates wide-spread access to high quality, alignment-free MHC sequence analysis.

11.
Artículo en Inglés | MEDLINE | ID: mdl-38862430

RESUMEN

Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.


Asunto(s)
Programas Informáticos , Humanos , Células K562 , Duplicación de Gen , Secuencias Repetidas en Tándem/genética , Algoritmos
12.
Genes (Basel) ; 15(6)2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38927591

RESUMEN

Glycogen synthase kinase-3ß (GSK3ß) not only plays a crucial role in regulating sperm maturation but also is pivotal in orchestrating the acrosome reaction. Here, we integrated single-molecule long-read and short-read sequencing to comprehensively examine GSK3ß expression patterns in adult Diannan small-ear pig (DSE) testes. We identified the most important transcript ENSSSCT00000039364 of GSK3ß, obtaining its full-length coding sequence (CDS) spanning 1263 bp. Gene structure analysis located GSK3ß on pig chromosome 13 with 12 exons. Protein structure analysis reflected that GSK3ß consisted of 420 amino acids containing PKc-like conserved domains. Phylogenetic analysis underscored the evolutionary conservation and homology of GSK3ß across different mammalian species. The evaluation of the protein interaction network, KEGG, and GO pathways implied that GSK3ß interacted with 50 proteins, predominantly involved in the Wnt signaling pathway, papillomavirus infection, hippo signaling pathway, hepatocellular carcinoma, gastric cancer, colorectal cancer, breast cancer, endometrial cancer, basal cell carcinoma, and Alzheimer's disease. Functional annotation identified that GSK3ß was involved in thirteen GOs, including six molecular functions and seven biological processes. ceRNA network analysis suggested that DSE GSK3ß was regulated by 11 miRNA targets. Furthermore, qPCR expression analysis across 15 tissues highlighted that GSK3ß was highly expressed in the testis. Subcellular localization analysis indicated that the majority of the GSK3ß protein was located in the cytoplasm of ST (swine testis) cells, with a small amount detected in the nucleus. Overall, our findings shed new light on GSK3ß's role in DSE reproduction, providing a foundation for further functional studies of GSK3ß function.


Asunto(s)
Glucógeno Sintasa Quinasa 3 beta , Espermatogénesis , Animales , Glucógeno Sintasa Quinasa 3 beta/genética , Glucógeno Sintasa Quinasa 3 beta/metabolismo , Masculino , Porcinos/genética , Espermatogénesis/genética , Testículo/metabolismo , Filogenia , Regulación de la Expresión Génica
13.
Microbiol Resour Announc ; 13(7): e0037524, 2024 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-38860804

RESUMEN

Paired-end short reads of Illumina HiSeq, MiSeq, and NovaSeq of simulated bacterial communities from fresh spinach and surface water were generated in silico at various sequencing depths. Multidrug-resistant Salmonella enterica serotype Indiana was included in the spinach community, while the water community contained multidrug-resistant Pseudomonas aeruginosa.

15.
Pathol Oncol Res ; 30: 1611676, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38818014

RESUMEN

The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos
16.
J Gen Virol ; 105(5)2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38767624

RESUMEN

Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.


Asunto(s)
Bombyx , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nanoporos , Nucleopoliedrovirus , Polimorfismo de Nucleótido Simple , Nucleopoliedrovirus/genética , Nucleopoliedrovirus/clasificación , Nucleopoliedrovirus/aislamiento & purificación , Animales , Secuenciación de Nanoporos/métodos , Bombyx/virología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genoma Viral
17.
bioRxiv ; 2024 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-38798511

RESUMEN

Introduction: Short-read amplicon sequencing studies have typically focused on 1-2 variable regions of the 16S rRNA gene. Species-level resolution is limited in these studies, as each variable region enables the characterisation of a different subsection of the microbiome. Although long-read sequencing techniques take advantage of all 9 variable regions by sequencing the entire 16S rRNA gene, they are substantially more expensive. This work assessed the feasibility of accurate species-level resolution and reproducibility using a relatively new sequencing kit and bioinformatics pipeline developed for short-read sequencing of multiple variable regions of the 16S rRNA gene. In addition, we evaluated the potential impact of different sample collection methods on our outcomes. Methods: Using xGen™ 16S Amplicon Panel v2 kits, sequencing of all 9 variable regions of the 16S rRNA gene was carried out on an Illumina MiSeq platform. Mock cells and mock DNA for 8 bacterial species were included as extraction and sequencing controls respectively. Within-run and between-run replicate samples, and pairs of stool and rectal swabs collected at 0-5 weeks from the same participants, were incorporated. Observed relative abundances of each species were compared to theoretical abundances provided by ZymoBIOMICS. Paired Wilcoxon rank sum tests and distance-based intraclass correlation coefficients were used to statistically compare alpha and beta diversity measures, respectively, for pairs of replicates and stool/rectal swab sample pairs. Results: Using multiple variable regions of the 16S ribosomal Ribonucleic Acid (rRNA) gene, we found that we could accurately identify taxa to a species level and obtain highly reproducible results at a species level. Yet, the microbial profiles of stool and rectal swab sample pairs differed substantially despite being collected concurrently from the same infants. Conclusion: This protocol provides an effective means for studying infant gut microbial samples at a species level. However, sample collection approaches need to be accounted for in any downstream analysis.

18.
Curr Protoc ; 4(5): e1046, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38717471

RESUMEN

Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.


Asunto(s)
Polimorfismo de Nucleótido Simple , Programas Informáticos , Flujo de Trabajo , Polimorfismo de Nucleótido Simple/genética , Biología Computacional/métodos , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Secuenciación Completa del Genoma/métodos
19.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38605641

RESUMEN

Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.


Asunto(s)
Genoma , ARN , RNA-Seq , Análisis de Secuencia de ARN , Simulación por Computador , ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento
20.
Methods Mol Biol ; 2744: 247-265, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38683324

RESUMEN

In this protocol paper, we review a set of methods developed in recent years for analyzing nuclear reads obtained from genome skimming. As the cost of sequencing drops, genome skimming (low-coverage shotgun sequencing of a sample) becomes increasingly a cost-effective method of measuring biodiversity at high resolution. While most practitioners only use assembled over-represented organelle reads from a genome skim, the vast majority of the reads are nuclear. Using assembly-free and alignment-free methods described in this protocol, we can compare samples to each other and reference genomes to compute distances, characterize underlying genomes, and infer evolutionary relationships.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Genoma/genética , Programas Informáticos , Núcleo Celular/genética , Biología Computacional/métodos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA