Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27.241
Filtrar
1.
Methods Mol Biol ; 2852: 273-288, 2025.
Artículo en Inglés | MEDLINE | ID: mdl-39235750

RESUMEN

The standardization of the microbiome sequencing of poultry rinsates is essential for generating comparable microbial composition data among poultry processing facilities if this technology is to be adopted by the industry. Samples must first be acquired, DNA must be extracted, and libraries must be constructed. In order to proceed to library sequencing, the samples should meet quality control standards. Finally, data must be analyzed using computer bioinformatics pipelines. This data can subsequently be incorporated into more advanced computer algorithms for risk assessment. Ultimately, *a uniform sequencing pipeline will enable both the government regulatory agencies and the poultry industry to identify potential weaknesses in food safety.This chapter presents the different steps for monitoring the population dynamics of the microbiome in poultry processing using 16S rDNA sequencing.


Asunto(s)
Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Microbiota , Aves de Corral , ARN Ribosómico 16S , Animales , ARN Ribosómico 16S/genética , Aves de Corral/microbiología , Microbiota/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , ADN Bacteriano/genética
2.
Methods Mol Biol ; 2854: 51-60, 2025.
Artículo en Inglés | MEDLINE | ID: mdl-39192118

RESUMEN

The application of CRISPR-mediated library screening has fundamentally transformed functional genomics by revealing the complexity of virus-host interactions. This protocol describes the use of CRISPR-mediated library screening to identify key functional genes regulating the innate immune response to PEDV infection. We detail a step-by-step process, starting from the design and construction of a customized CRISPR knockout library targeting genes involved in innate immunity to the effective delivery of these constructs into cells using lentiviral vectors. Subsequently, we outline the process of identifying functional genes postviral attack, including the use of next-generation sequencing (NGS), to analyze and identify knockout cells that exhibit altered responses to infection. This integrated approach provides researchers in immunology and virology with a resource and a robust framework for uncovering the genetic basis of host-pathogen interactions and the arsenal of the innate immune system against viral invasions.


Asunto(s)
Sistemas CRISPR-Cas , Técnicas de Inactivación de Genes , Biblioteca de Genes , Inmunidad Innata , Inmunidad Innata/genética , Sistemas CRISPR-Cas/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Interacciones Huésped-Patógeno/inmunología , Interacciones Huésped-Patógeno/genética , Línea Celular , Lentivirus/genética
3.
Protein Sci ; 33(10): e5169, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39283039

RESUMEN

Golden Gate assembly (GGA) can seamlessly generate full-length genes from DNA fragments. In principle, GGA could be used to design combinatorial mutation libraries for protein engineering, but creating accurate, complex, and cost-effective libraries has been challenging. We present GGAssembler, a graph-theoretical method for economical design of DNA fragments that assemble a combinatorial library that encodes any desired diversity. We used GGAssembler for one-pot in vitro assembly of camelid antibody libraries comprising >105 variants with DNA costs <0.007$ per variant and dropping significantly with increased library complexity. >93% of the desired variants were present in the assembly product and >99% were represented within the expected order of magnitude as verified by deep sequencing. The GGAssembler workflow is, therefore, an accurate approach for generating complex variant libraries that may drastically reduce costs and accelerate discovery and optimization of antibodies, enzymes and other proteins. The workflow is accessible through a Google Colab notebook at https://github.com/Fleishman-Lab/GGAssembler.


Asunto(s)
Mutación , Ingeniería de Proteínas/métodos , Ingeniería de Proteínas/economía , Biblioteca de Genes , ADN/genética , ADN/química , Biblioteca de Péptidos
5.
mSphere ; 9(8): e0036724, 2024 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-39109886

RESUMEN

Protein production strategies in bacteria are often limited due to the need for cell lysis and complicated purification schemes. To avoid these challenges, researchers have developed bacterial strains capable of secreting heterologous protein products outside the cell, but secretion titers often remain too low for commercial applicability. Improved understanding of the link between secretion system structure and its secretory abilities can help overcome the barrier to engineering higher secretion titers. Here, we investigated this link with the PrgI protein, the monomer of the secretory channel of the type 3 secretion system (T3SS) of Salmonella enterica. Despite detailed knowledge of the PrgI needle's assembly and structure, little is known about how its structure influences its secretory capabilities. To study this, we recently constructed a comprehensive codon mutagenesis library of the PrgI protein utilizing a novel one-pot recombineering approach. We then screened this library for functional T3SS assembly and secretion titer by measuring the secretion of alkaline phosphatase using a high-throughput activity assay. This allowed us to construct a first-of-its-kind secretion fitness landscape to characterize the PrgI needle's mutability at each position as well as the mutations which lead to enhanced T3SS secretion. We discovered new design rules for building a functional T3SS as well as identified hypersecreting mutants. This work can be used to increase understanding of the T3SS's assembly and identify further targets for engineering. This work also provides a blueprint for future efforts to engineer other complex protein assemblies through the construction of fitness landscapes.IMPORTANCEProtein secretion offers a simplified alternative method for protein purification from bacterial hosts. However, the current state-of-the-art methods for protein secretion in bacteria are still hindered by low yields relative to traditional protein purification strategies. Engineers are now seeking strategies to enhance protein secretion titers from bacterial hosts, often through genetic manipulations. In this study, we demonstrate that protein engineering strategies focused on altering the secretion apparatus can be a fruitful avenue toward this goal. Specifically, this study focuses on how changes to the PrgI needle protein from the type 3 secretion system from Salmonella enterica can impact secretion titer. We demonstrate that this complex is amenable to comprehensive mutagenesis studies and that this can yield both PrgI variants with increased secretory capabilities and insight into the normal functioning of the type 3 secretion system.


Asunto(s)
Proteínas Bacterianas , Mutagénesis , Salmonella enterica , Sistemas de Secreción Tipo III , Sistemas de Secreción Tipo III/genética , Sistemas de Secreción Tipo III/metabolismo , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Salmonella enterica/genética , Salmonella enterica/metabolismo , Biblioteca de Genes , Salmonella typhimurium/genética , Salmonella typhimurium/metabolismo
6.
BMC Genomics ; 25(1): 778, 2024 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-39127634

RESUMEN

BACKGROUND: DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. RESULTS: Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences' AVITI and Illumina's NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina's NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms' respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. CONCLUSION: These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Análisis de la Célula Individual/métodos , Biblioteca de Genes
7.
Microb Genom ; 10(8)2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39137139

RESUMEN

Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonization, disease, antimicrobial resistance and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using the previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30 976 genomes and contextual data for carriage and disease pneumococci recovered between 1916 and 2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.


Asunto(s)
Código de Barras del ADN Taxonómico , Genoma Bacteriano , Tipificación de Secuencias Multilocus , Infecciones Neumocócicas , Streptococcus pneumoniae , Streptococcus pneumoniae/genética , Streptococcus pneumoniae/clasificación , Streptococcus pneumoniae/aislamiento & purificación , Tipificación de Secuencias Multilocus/métodos , Humanos , Código de Barras del ADN Taxonómico/métodos , Infecciones Neumocócicas/microbiología , Infecciones Neumocócicas/epidemiología , Filogenia , Biblioteca de Genes , Secuenciación Completa del Genoma/métodos
8.
Methods Mol Biol ; 2838: 211-219, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39126635

RESUMEN

Next-generation sequencing (NGS) technologies are continuously being developed and are becoming a more cost-effective tool for the characterization of viral genomes. Whole genome sequencing of segmented viruses, such as epizootic hemorrhagic disease virus (EHDV), provides insights into the molecular epidemiology as well as such viral evolutionary mechanisms as genetic reassortment. Here, we present a detailed method for obtaining full genome sequence data for EHDV using Illumina technology. The protocol includes details from RNA extraction and purification, the synthesis of cDNA, sequencing library preparation, to genome assembly.


Asunto(s)
Genoma Viral , Virus de la Enfermedad Hemorrágica Epizoótica , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación Completa del Genoma , Virus de la Enfermedad Hemorrágica Epizoótica/genética , Virus de la Enfermedad Hemorrágica Epizoótica/aislamiento & purificación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Animales , ARN Viral/genética , Biblioteca de Genes , Infecciones por Reoviridae/virología , Infecciones por Reoviridae/veterinaria
9.
Anal Biochem ; 695: 115636, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39111682

RESUMEN

In recent years, more sophisticated DNA technologies for genotyping have enabled considerable progress in various fields such as clinical genetics, archaeogenetics and forensic genetics. DNA samples previously rejected as too challenging to analyze due to low amounts of degraded DNA can now provide useful information. To increase the chances of success with the new methodologies, it is crucial to know the fragment size of the template DNA molecules, and whether the DNA in a sample is mostly single or double stranded. With this knowledge, an appropriate library preparation method can be chosen, and the DNA shearing parameters of the protocol can be adjusted to the DNA fragment size in the sample. In this study, we first developed and evaluated a user-friendly fluorometry-based protocol for estimation of DNA strandedness. We also evaluated different capillary electrophoresis methods for estimation of DNA fragmentation levels. Next, we applied the developed methodologies to a broad variety of DNA samples processed with different DNA extraction protocols. Our findings show that both the applied DNA extraction method and the sample type affect the DNA strandedness and fragmentation. The established protocols and the gained knowledge will be applicable for future sequencing-based high-density SNP genotyping in various fields.


Asunto(s)
ADN , ADN/genética , ADN/análisis , Humanos , Biblioteca de Genes , Fragmentación del ADN , Biblioteca Genómica , Polimorfismo de Nucleótido Simple , Electroforesis Capilar
10.
Bioconjug Chem ; 35(8): 1251-1257, 2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39116103

RESUMEN

The DNA-encoded library (DEL) is a robust tool for chemical biology and drug discovery. In this study, we developed a DNA-compatible light-promoted reaction that is highly efficient and plate-compatible for DEL construction based on the formation of the indazolone scaffold. Employing this high-efficiency approach, we constructed a DEL featuring an indazolone core, which enabled the identification of a novel series of ligands specifically targeting E1A-binding protein (p300) after DEL selection. Taken together, our findings underscore the feasibility of light-promoted reactions in DEL synthesis and unveil promising avenues for developing p300-targeting inhibitors.


Asunto(s)
ADN , Descubrimiento de Drogas , Proteína p300 Asociada a E1A , Indazoles , Bibliotecas de Moléculas Pequeñas , ADN/química , Indazoles/química , Indazoles/farmacología , Proteína p300 Asociada a E1A/antagonistas & inhibidores , Proteína p300 Asociada a E1A/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Descubrimiento de Drogas/métodos , Humanos , Biblioteca de Genes , Ligandos
11.
Artículo en Inglés | MEDLINE | ID: mdl-39209796

RESUMEN

Increasing the accuracy of the nucleotide sequence alignment is an essential issue in genomics research. Although classic dynamic programming (DP) algorithms (e.g., Smith-Waterman and Needleman-Wunsch) guarantee to produce the optimal result, their time complexity hinders the application of large-scale sequence alignment. Many optimization efforts that aim to accelerate the alignment process generally come from three perspectives: redesigning data structures [e.g., diagonal or striped Single Instruction Multiple Data (SIMD) implementations], increasing the number of parallelisms in SIMD operations (e.g., difference recurrence relation), or reducing search space (e.g., banded DP). However, no methods combine all these three aspects to build an ultra-fast algorithm. In this study, we developed a Banded Striped Aligner (BSAlign) library that delivers accurate alignment results at an ultra-fast speed by knitting a series of novel methods together to take advantage of all of the aforementioned three perspectives with highlights such as active F-loop in striped vectorization and striped move in banded DP. We applied our new acceleration design on both regular and edit distance pairwise alignment. BSAlign achieved 2-fold speed-up than other SIMD-based implementations for regular pairwise alignment, and 1.5-fold to 4-fold speed-up in edit distance-based implementations for long reads. BSAlign is implemented in C programing language and is available at https://github.com/ruanjue/bsalign.


Asunto(s)
Algoritmos , Alineación de Secuencia , Programas Informáticos , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes , Biología Computacional/métodos , Secuencia de Bases/genética
12.
Physiol Plant ; 176(4): e14449, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39164923

RESUMEN

Plant breeders leverage mutagenesis using chemical, biological, and physical mutagens to create novel trait variations. Many widely used sorghum genotypes have a narrow genetic base, which hinders improvements using classical breeding. Enhancing the diversity of the sorghum genome thus remains a key priority for sorghum breeders. To accelerate the genetic enhancement of sorghum, an extensive library comprised of seeds from 150,000 individual mutant plants of the Sorghum bicolor inbred line BTx623 was established using ethyl methanesulphonate (EMS) as a mutagen. The sorghum mutant library was bulked into 1498 pools (~100 seed heads per pool). In each pool, DNA was extracted from a subset of the seed and screened using the FIND-IT technology based on droplet digital PCR. All 43 nucleotide substitutions that were screened using FIND-IT were identified, demonstrating the potential to identify any EMS-derived mutation in an elite line of sorghum within days. This diverse library represents the largest collection of sorghum mutants ever conceived, estimated to cover 240% of all possible EMS-induced mutation points within the Sorghum genome. Using FIND-IT, the speed at which a specific desired EMS-derived mutation can be identified is a major upgrade to conventional reverse genetic techniques. Additionally, the ease at which valuable variants can be integrated into elite commercial lines is a far simpler and less expensive process compared to genome editing. Genomic variations in the library will have direct utility as a breeding resource for commercial sorghum applications, allowing enhanced adaptation to climate change and enhanced yield potential in marginal environments.


Asunto(s)
Metanosulfonato de Etilo , Mutagénesis , Fitomejoramiento , Sorghum , Sorghum/genética , Sorghum/efectos de los fármacos , Mutagénesis/genética , Fitomejoramiento/métodos , Mutación/genética , Genotipo , Productos Agrícolas/genética , Genoma de Planta/genética , Semillas/genética , Semillas/efectos de los fármacos , Mutágenos , Biblioteca de Genes
13.
Science ; 385(6711): 892-898, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39172826

RESUMEN

Single-molecule techniques are ideally poised to characterize complex dynamics but are typically limited to investigating a small number of different samples. However, a large sequence or chemical space often needs to be explored to derive a comprehensive understanding of complex biological processes. Here we describe multiplexed single-molecule characterization at the library scale (MUSCLE), a method that combines single-molecule fluorescence microscopy with next-generation sequencing to enable highly multiplexed observations of complex dynamics. We comprehensively profiled the sequence dependence of DNA hairpin properties and Cas9-induced target DNA unwinding-rewinding dynamics. The ability to explore a large sequence space for Cas9 allowed us to identify a number of target sequences with unexpected behaviors. We envision that MUSCLE will enable the mechanistic exploration of many fundamental biological processes.


Asunto(s)
ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Microscopía Fluorescente , Imagen Individual de Molécula , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Imagen Individual de Molécula/métodos , ADN/química , ADN/genética , Microscopía Fluorescente/métodos , Proteína 9 Asociada a CRISPR , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes , Sistemas CRISPR-Cas
15.
Microb Cell Fact ; 23(1): 218, 2024 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-39098937

RESUMEN

BACKGROUND: Microbial robustness is crucial for developing cell factories that maintain consistent performance in a challenging environment such as large-scale bioreactors. Although tools exist to assess and understand robustness at a phenotypic level, the underlying metabolic and genetic mechanisms are not well defined, which limits our ability to engineer more strains with robust functions. RESULTS: This study encompassed four steps. (I) Fitness and robustness were analyzed from a published dataset of yeast mutants grown in multiple environments. (II) Genes and metabolic processes affecting robustness or fitness were identified, and 14 of these genes were deleted in Saccharomyces cerevisiae CEN.PK113-7D. (III) The mutants bearing gene deletions were cultivated in three perturbation spaces mimicking typical industrial processes. (IV) Fitness and robustness were determined for each mutant in each perturbation space. We report that robustness varied according to the perturbation space. We identified genes associated with increased robustness such as MET28, linked to sulfur metabolism; as well as genes associated with decreased robustness, including TIR3 and WWM1, both involved in stress response and apoptosis. CONCLUSION: The present study demonstrates how phenomics datasets can be analyzed to reveal the relationship between phenotypic response and associated genes. Specifically, robustness analysis makes it possible to study the influence of single genes and metabolic processes on stable microbial performance in different perturbation spaces. Ultimately, this information can be used to enhance robustness in targeted strains.


Asunto(s)
Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Marcadores Genéticos , Mutación , Biblioteca de Genes , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Fenotipo , Eliminación de Gen
16.
Nat Cell Biol ; 26(8): 1359-1372, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39095657

RESUMEN

Circular RNA (circRNA) is covalently closed, single-stranded RNA produced by back-splicing. A few circRNAs have been implicated as functional; however, we lack understanding of pathways that are regulated by circRNAs. Here we generated a pooled short-hairpin RNA library targeting the back-splice junction of 3,354 human circRNAs that are expressed at different levels (ranging from low to high) in humans. We used this library for loss-of-function proliferation screens in a panel of 18 cancer cell lines from four tissue types harbouring mutations leading to constitutive activity of defined pathways. Both context-specific and non-specific circRNAs were identified. Some circRNAs were found to directly regulate their precursor, whereas some have a function unrelated to their precursor. We validated these observations with a secondary screen and uncovered a role for circRERE(4-10) and circHUWE1(22,23), two cell-essential circRNAs, circSMAD2(2-6), a WNT pathway regulator, and circMTO1(2,RI,3), a regulator of MAPK signalling. Our work sheds light on pathways regulated by circRNAs and provides a catalogue of circRNAs with a measurable function.


Asunto(s)
Proliferación Celular , ARN Circular , ARN Circular/genética , ARN Circular/metabolismo , Humanos , Proliferación Celular/genética , Línea Celular Tumoral , Vía de Señalización Wnt/genética , Transducción de Señal , ARN/genética , ARN/metabolismo , Empalme del ARN , Regulación Neoplásica de la Expresión Génica , Biblioteca de Genes
17.
Methods Mol Biol ; 2846: 181-189, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39141237

RESUMEN

Cleavage Under Targets and Tagmentation (CUT&Tag) provides high-resolution sequencing libraries for profiling diverse chromatin components. This protocol details the steps to generate CUT&Tag libraries from fresh or frozen tissues. This CUT&Tag workflow has nine main steps: isolation of nuclei from tissues, binding of nuclei to Concanavalin A-coated beads, binding of the primary antibody, binding of the secondary antibody, binding pA-Tn5 adapter complex, tagmentation, DNA extraction, PCR, and post-PCR cleanup and size selection. This protocol enabled us to generate and sequence CUT&Tag libraries across a broad range of fresh and frozen tissue types.


Asunto(s)
Epigenómica , Epigenómica/métodos , Humanos , Biblioteca de Genes , Cromatina/genética , Cromatina/metabolismo , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Núcleo Celular/genética , Núcleo Celular/metabolismo , Congelación , Reacción en Cadena de la Polimerasa/métodos
18.
Methods Mol Biol ; 2826: 31-44, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39017883

RESUMEN

Next-generation sequencing has the potential to uncover the complex nature of B cell immunity by revealing the full complexity of B cell receptor (BCR) repertoires in health and disease. However, there are drawbacks which can compromise the validity of the repertoire analysis caused by quantitative bias and accumulation of sequencing errors during the library preparation and sequencing. Here, we provide an optimized protocol designed to minimize bias for reproducible and accurate preparation of human BCR repertoire libraries for high-throughput sequencing.


Asunto(s)
Linfocitos B , Secuenciación de Nucleótidos de Alto Rendimiento , Receptores de Antígenos de Linfocitos B , Humanos , Receptores de Antígenos de Linfocitos B/genética , Receptores de Antígenos de Linfocitos B/inmunología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Linfocitos B/inmunología , Linfocitos B/metabolismo , Biblioteca de Genes
19.
Bioinformatics ; 40(7)2024 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-38991828

RESUMEN

MOTIVATION: Sanger sequencing of taxonomic marker genes (e.g. 16S/18S/ITS/rpoB/cpn60) represents the leading method for identifying a wide range of microorganisms including bacteria, archaea, and fungi. However, the manual processing of sequence data and limitations associated with conventional BLAST searches impede the efficient generation of strain libraries essential for cataloging microbial diversity and discovering novel species. RESULTS: isolateR addresses these challenges by implementing a standardized and scalable three-step pipeline that includes: (1) automated batch processing of Sanger sequence files, (2) taxonomic classification via global alignment to type strain databases in accordance with the latest international nomenclature standards, and (3) straightforward creation of strain libraries and handling of clonal isolates, with the ability to set customizable sequence dereplication thresholds and combine data from multiple sequencing runs into a single library. The tool's user-friendly design also features interactive HTML outputs that simplify data exploration and analysis. Additionally, in silico benchmarking done on two comprehensive human gut genome catalogues (IMGG and Hadza hunter-gather populations) showcase the proficiency of isolateR in uncovering and cataloging the nuanced spectrum of microbial diversity, advocating for a more targeted and granular exploration within individual hosts to achieve the highest strain-level resolution possible when generating culture collections. AVAILABILITY AND IMPLEMENTATION: isolateR is available at: https://github.com/bdaisley/isolateR.


Asunto(s)
Bacterias , Programas Informáticos , Bacterias/genética , Bacterias/clasificación , Análisis de Secuencia de ADN/métodos , Humanos , Archaea/genética , Hongos/genética , Biblioteca de Genes
20.
Bioinformatics ; 40(7)2024 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-38950175

RESUMEN

MOTIVATION: T cell receptors (TCRs) constitute a major component of our adaptive immune system, governing the recognition and response to internal and external antigens. Studying the TCR diversity via sequencing technology is critical for a deeper understanding of immune dynamics. However, library sizes differ substantially across samples, hindering the accurate estimation/comparisons of alpha diversities. To address this, researchers frequently use an overall rarefying approach in which all samples are sub-sampled to an even depth. Despite its pervasive application, its efficacy has never been rigorously assessed. RESULTS: In this paper, we develop an innovative "multi-bin" rarefying approach that partitions samples into multiple bins according to their library sizes, conducts rarefying within each bin for alpha diversity calculations, and performs meta-analysis across bins. Extensive simulations using real-world data highlight the inadequacy of the overall rarefying approach in controlling the confounding effect of library size. Our method proves robust in addressing library size confounding, outperforming competing normalization strategies by achieving better-controlled type-I error rates and enhanced statistical power in association tests. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/mli171/MultibinAlpha. The datasets are freely available at https://doi.org/10.21417/B7001Z and https://doi.org/10.21417/AR2019NC.


Asunto(s)
Receptores de Antígenos de Linfocitos T , Receptores de Antígenos de Linfocitos T/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes , Variación Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA