Búsqueda | Portal Regional de la BVS

1.

A high-precision genome size estimator based on the k-mer histogram correction.

Liao, Xiangyu; Zhu, Wufei; Liu, Chaoyun.

Front Genet ; 15: 1451730, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39238787

RESUMEN

Introduction: In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge. Methods: In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction. Results: We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results. Discussion: The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET.

2.

Giraffe: A tool for comprehensive processing and visualization of multiple long-read sequencing data.

Liu, Xudong; Shao, Yanwen; Guo, Zhihao; Ni, Ying; Sun, Xuan; Leung, Anskar Yu Hung; Li, Runsheng.

Comput Struct Biotechnol J ; 23: 3241-3246, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-39279873

RESUMEN

Third-generation sequencing techniques have become increasingly popular due to their capacity to produce long, high-quality reads. Effective comparative analysis across various samples and sequencing platforms is essential for understanding biological mechanisms and establishing benchmark baselines. However, existing tools for long-read sequencing predominantly focus on quality control (QC) and processing for individual samples, complicating the comparison of multiple datasets. The lack of comprehensive tools for data comparison and visualization presents challenges for researchers with limited bioinformatics experience. To address this gap, we present Giraffe (https://github.com/lrslab/Giraffe_View), a Python3-based command-line tool designed for comparative analysis and visualization across diverse samples and platforms. Giraffe facilitates the assessment of read quality, sequencing bias, and genomic regional methylation proportions for both DNA and direct RNA sequencing reads. Its effectiveness has been demonstrated in various scenarios, including comparisons of sequencing methods (whole genome amplification vs. shotgun), sequencing platforms (Oxford Nanopore Technology, ONT vs. Pacific Biosciences, PacBio), tissues (kidney marrow with and without blood), and biological replicates (kidney marrows).

3.

A novel workflow to improve genotyping of multigene families in wildlife species: An experimental set-up with a known model system.

Gillingham, Mark A F; Montero, B Karina; Wihelm, Kerstin; Grudzus, Kara; Sommer, Simone; Santos, Pablo S C.

Mol Ecol Resour ; 21(3): 982-998, 2021 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-33113273

RESUMEN

Genotyping complex multigene families in novel systems is particularly challenging. Target primers frequently amplify simultaneously multiple loci leading to high PCR and sequencing artefacts such as chimeras and allele amplification bias. Most genotyping pipelines have been validated in nonmodel systems whereby the real genotype is unknown and the generation of artefacts may be highly repeatable. Further hindering accurate genotyping, the relationship between artefacts and genotype complexity (i.e. number of alleles per genotype) within a PCR remains poorly described. Here, we investigated the latter by experimentally combining multiple known major histocompatibility complex (MHC) haplotypes of a model organism (chicken, Gallus gallus, 43 artificial genotypes with 2-13 alleles per amplicon). In addition to well-defined 'optimal' primers, we simulated a nonmodel species situation by designing 'cross-species' primers based on sequence data from closely related Galliform species. We applied a novel open-source genotyping pipeline (ACACIA; https://gitlab.com/psc_santos/ACACIA), and compared its performance with another, previously published pipeline (AmpliSAS). Allele calling accuracy was higher when using ACACIA (98.5% versus 97% and 77.8% versus 75% for the 'optimal' and 'cross-species' data sets, respectively). Systematic allele dropout of three alleles owing to primer mismatch in the 'cross-species' data set explained high allele calling repeatability (100% when using ACACIA) despite low accuracy, demonstrating that repeatability can be misleading when evaluating genotyping workflows. Genotype complexity was positively associated with nonchimeric artefacts, chimeric artefacts (nonlinearly by levelling when amplifying more than 4-6 alleles) and allele amplification bias. Our study exemplifies and demonstrates pitfalls researchers should avoid to reliably genotype complex multigene families.

Asunto(s)

Técnicas de Genotipaje , Familia de Multigenes , Programas Informáticos , Flujo de Trabajo , Alelos , Animales , Animales Salvajes/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN

4.

GC Content-Associated Sequencing Bias Caused by Library Preparation Method May Infrequently Affect Salmonella Serotype Prediction Using SeqSero2.

Li, Shaoting; Zhang, Shaokang; Deng, Xiangyu.

Appl Environ Microbiol ; 86(18)2020 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-32680856

Asunto(s)

Salmonella , Composición de Base , Biblioteca de Genes , Salmonella/genética , Serogrupo , Serotipificación

5.

An extension to: Systematic assessment of commercially available low-input miRNA library preparation kits.

Heinicke, Fatima; Zhong, Xiangfu; Zucknick, Manuela; Breidenbach, Johannes; Sundaram, Arvind Y M; T Flåm, Siri; Leithaug, Magnus; Dalland, Marianne; Rayner, Simon; Lie, Benedicte A; Gilfillan, Gregor D.

RNA Biol ; 17(9): 1284-1292, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32436772

RESUMEN

High-throughput sequencing has emerged as the favoured method to study microRNA (miRNA) expression, but biases introduced during library preparation have been reported. We recently compared the performance (sensitivity, reliability, titration response and differential expression) of six commercially-available kits on synthetic miRNAs and human RNA, where library preparation was performed by the vendors. We hereby supplement this study with data from two further commonly used kits (NEBNext, NEXTflex) whose manufacturers initially declined to participate. NEXTflex demonstrated the highest sensitivity, which may reflect its use of partially-randomized adapter sequences, but overall performance was lower than the QIAseq and TailorMix kits. NEBNext showed intermediate performance. We reaffirm that biases are kit specific, complicating the comparison of miRNA datasets generated using different kits.

Asunto(s)

Biblioteca de Genes , Ingeniería Genética , MicroARNs/genética , Ingeniería Genética/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Químicos de Laboratorio/normas , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos

6.

Systematic assessment of commercially available low-input miRNA library preparation kits.

Heinicke, Fatima; Zhong, Xiangfu; Zucknick, Manuela; Breidenbach, Johannes; Sundaram, Arvind Y M; T Flåm, Siri; Leithaug, Magnus; Dalland, Marianne; Farmer, Andrew; Henderson, Jordana M; Hussong, Melanie A; Moll, Pamela; Nguyen, Loan; McNulty, Amanda; Shaffer, Jonathan M; Shore, Sabrina; Yip, Hoichong Karen; Vitkovska, Jana; Rayner, Simon; Lie, Benedicte A; Gilfillan, Gregor D.

RNA Biol ; 17(1): 75-86, 2020 01.

Artículo en Inglés | MEDLINE | ID: mdl-31559901

RESUMEN

High-throughput sequencing is increasingly favoured to assay the presence and abundance of microRNAs (miRNAs) in biological samples, even from low RNA amounts, and a number of commercial vendors now offer kits that allow miRNA sequencing from sub-nanogram (ng) inputs. Although biases introduced during library preparation have been documented, the relative performance of current reagent kits has not been investigated in detail. Here, six commercial kits capable of handling <100ng total RNA input were used for library preparation, performed by kit manufactures, on synthetic miRNAs of known quantities and human total RNA samples. We compared the performance of miRNA detection sensitivity, reliability, titration response and the ability to detect differentially expressed miRNAs. In addition, we assessed the use of unique molecular identifiers (UMI) sequence tags in one kit. We observed differences in detection sensitivity and ability to identify differentially expressed miRNAs between the kits, but none were able to detect the full repertoire of synthetic miRNAs. The reliability within the replicates of all kits was good, while larger differences were observed between the kits, although none could accurately quantify the relative levels of the majority of miRNAs. UMI tags, at least within the input ranges tested, offered little advantage to improve data utility. In conclusion, biases in miRNA abundance are heavily influenced by the kit used for library preparation, suggesting that comparisons of datasets prepared by different procedures should be made with caution. This article is intended to assist researchers select the most appropriate kit for their experimental conditions.

Asunto(s)

Biblioteca de Genes , Ingeniería Genética/métodos , MicroARNs/genética , Ingeniería Genética/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , MicroARNs/síntesis química , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos

7.

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes.

Sato, Mitsuhiko P; Ogura, Yoshitoshi; Nakamura, Keiji; Nishida, Ruriko; Gotoh, Yasuhiro; Hayashi, Masahiro; Hisatsune, Junzo; Sugai, Motoyuki; Takehiko, Itoh; Hayashi, Tetsuya.

DNA Res ; 26(5): 391-398, 2019 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-31364694

RESUMEN

In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.

Asunto(s)

Bacterias/genética , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Metagenoma , Análisis de Secuencia de ADN/normas , Composición de Base , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos

8.

Toward accurate species-level metabarcoding of arthropod communities from the tropical forest canopy.

Creedy, Thomas J; Ng, Wui Shen; Vogler, Alfried P.

Ecol Evol ; 9(6): 3105-3116, 2019 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-30962884

RESUMEN

Metabarcoding of arthropod communities can be used for assessing species diversity in tropical forests but the methodology requires validation for accurate and repeatable species occurrences in complex mixtures. This study investigates how the composition of ecological samples affects the accuracy of species recovery.Starting with field-collected bulk samples from the tropical canopy, the recovery of specimens was tested for subsets of different body sizes and major taxa, by assembling these subsets into increasingly complex composite pools. After metabarcoding, we track whether richness, diversity, and most importantly composition of any size class or taxonomic subset are affected by the presence of other subsets in the mixture.Operational taxonomic units (OTUs) greatly exceeded the number of morphospecies in most taxa, even under very stringent sequencing read filtering. There was no significant effect on the recovered OTU richness of small and medium-sized arthropods when metabarcoded alongside larger arthropods, despite substantial biomass differences in the mixture. The recovery of taxonomic subsets was not generally influenced by the presence of other taxa, although with some exceptions likely due to primer mismatches. Considerable compositional variation within size and taxon-based subcommunities was evident resulting in high beta-diversity among samples from within a single tree canopy, but this beta-diversity was not affected by experimental manipulation.We conclude that OTU recovery in complex arthropod communities, with sufficient sequencing depth and within reasonable size ranges, is not skewed by variable biomass of the constituent species. This could remove the need for time-intensive manual sorting prior to metabarcoding. However, there remains a chance of taxonomic bias, which may be primer-dependent. There will never be a panacea primer; instead, metabarcoding studies should carefully consider whether the aim is broadscale turnover, in which case these biases may not be important, or species lists, in which case separate PCRs and sequencing might be necessary. OTU number inflation remains an issue in metabarcoding and requires bioinformatic development, particularly in read filtering and OTU clustering, and/or greater use of species-identifying sequences generated outside of bulk sequencing.

9.

Two reads to rule them all: Nanopore long read-guided assembly of the iconic Christmas Island red crab, Gecarcoidea natalis (Pocock, 1888), mitochondrial genome and the challenges of AT-rich mitogenomes.

Gan, Han Ming; Linton, Stuart M; Austin, Christopher M.

Mar Genomics ; 45: 64-71, 2019 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-30928201

RESUMEN

Despite recent advances in sequencing technology, a complete mitogenome assembly is still unavailable for the gecarcinid land crabs that include the iconic Christmas Island red crab (Gecarcoidea natalis) which is known for its high population density, annual mass breeding migration and ecological significance in maintaining rainforest structure. Using sequences generated from Nanopore and Illumina platforms, we assembled the complete mitogenome for G. natalis, the first for the genus and only second for the family Gecarcinidae. Nine Nanopore long reads representing 0.15% of the sequencing output from an overnight MinION Nanopore run were aligned to the mitogenome. Two of them were >10â¯kb and combined are sufficient to span the entire G. natalis mitogenome. The use of Illumina genome skimming data only resulted in a fragmented assembly that can be attributed to low to zero sequencing coverage in multiple high AT-regions including the mitochondrial protein-coding genes (NAD4 and NAD5), 16S ribosomal rRNA and non-coding control region. Supplementing the mitogenome assembly with previously acquired transcriptome dataset containing high abundance of mitochondrial transcripts improved mitogenome sequence coverage and assembly reliability. We then inferred the phylogeny of the Eubrachyura using Maximum Likelihood and Bayesian approaches, confirming the phylogenetic placement of G. natalis within the family Gecarcinidae based on whole mitogenome alignment. Given the substantial impact of AT-content on mitogenome assembly and the value of complete mitogenomes in phylogenetic and comparative studies, we recommend that future mitogenome sequencing projects consider generating a modest amount of Nanopore long reads to facilitate the closing of problematic and fragmented mitogenome assemblies.

Asunto(s)

Braquiuros/genética , Genoma Mitocondrial/genética , Animales , Australia , Braquiuros/clasificación , Islas del Oceano Índico , Nanoporos , Filogenia , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN

10.

Decreasing miRNA sequencing bias using a single adapter and circularization approach.

Barberán-Soler, Sergio; Vo, Jenny M; Hogans, Ryan E; Dallas, Anne; Johnston, Brian H; Kazakov, Sergei A.

Genome Biol ; 19(1): 105, 2018 09 03.

Artículo en Inglés | MEDLINE | ID: mdl-30173660

RESUMEN

The ability to accurately quantify all the microRNAs (miRNAs) in a sample is important for understanding miRNA biology and for development of new biomarkers and therapeutic targets. We develop a new method for preparing miRNA sequencing libraries, RealSeq®-AC, that involves ligating the miRNAs with a single adapter and circularizing the ligation products. When compared to other methods, RealSeq®-AC provides greatly reduced miRNA sequencing bias and allows the identification of the largest variety of miRNAs in biological samples. This reduced bias also allows robust quantification of miRNAs present in samples across a wide range of RNA input levels.

Asunto(s)

MicroARNs/química , Análisis de Secuencia de ARN/métodos , Sesgo , Química Encefálica , Humanos , MicroARNs/análisis

11.

Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads.

Choudhari, Sulbha; Grigoriev, Andrey.

Microorganisms ; 5(1)2017 Jan 24.

Artículo en Inglés | MEDLINE | ID: mdl-28125031

RESUMEN

Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads.

12.

Dual randomization of oligonucleotides to reduce the bias in ribosome-profiling libraries.

Lecanda, Aarón; Nilges, Benedikt S; Sharma, Puneet; Nedialkova, Danny D; Schwarz, Juliane; Vaquerizas, Juan M; Leidel, Sebastian A.

Methods ; 107: 89-97, 2016 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-27450428

RESUMEN

Protein translation is at the heart of cellular metabolism and its in-depth characterization is key for many lines of research. Recently, ribosome profiling became the state-of-the-art method to quantitatively characterize translation dynamics at a transcriptome-wide level. However, the strategy of library generation affects its outcomes. Here, we present a modified ribosome-profiling protocol starting from yeast, human cells and vertebrate brain tissue. We use a DNA linker carrying four randomized positions at its 5' end and a reverse-transcription (RT) primer with three randomized positions to reduce artifacts during library preparation. The use of seven randomized nucleotides allows to efficiently detect library-generation artifacts. We find that the effect of polymerase chain reaction (PCR) artifacts is relatively small for global analyses when sufficient input material is used. However, when input material is limiting, our strategy improves the sensitivity of gene-specific analyses. Furthermore, randomized nucleotides alleviate the skewed frequency of specific sequences at the 3' end of ribosome-protected fragments (RPFs) likely resulting from ligase specificity. Finally, strategies that rely on dual ligation show a high degree of gene-coverage variation. Taken together, our approach helps to remedy two of the main problems associated with ribosome-profiling data. This will facilitate the analysis of translational dynamics and increase our understanding of the influence of RNA modifications on translation.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Ingeniería Genética/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Ribosomas/genética , Humanos , Oligonucleótidos/genética , Biosíntesis de Proteínas/genética , Ribosomas/química , Transcriptoma/genética

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA