Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21.856
Filtrar
1.
Methods Mol Biol ; 2852: 273-288, 2025.
Artículo en Inglés | MEDLINE | ID: mdl-39235750

RESUMEN

The standardization of the microbiome sequencing of poultry rinsates is essential for generating comparable microbial composition data among poultry processing facilities if this technology is to be adopted by the industry. Samples must first be acquired, DNA must be extracted, and libraries must be constructed. In order to proceed to library sequencing, the samples should meet quality control standards. Finally, data must be analyzed using computer bioinformatics pipelines. This data can subsequently be incorporated into more advanced computer algorithms for risk assessment. Ultimately, *a uniform sequencing pipeline will enable both the government regulatory agencies and the poultry industry to identify potential weaknesses in food safety.This chapter presents the different steps for monitoring the population dynamics of the microbiome in poultry processing using 16S rDNA sequencing.


Asunto(s)
Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Microbiota , Aves de Corral , ARN Ribosómico 16S , Animales , ARN Ribosómico 16S/genética , Aves de Corral/microbiología , Microbiota/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , ADN Bacteriano/genética
5.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39222062

RESUMEN

Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.


Asunto(s)
Metagenómica , Microbiota , Humanos , Microbiota/genética , Metagenómica/métodos , Biología Computacional/métodos , Metagenoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Análisis de Secuencia de ADN/métodos
9.
HLA ; 104(3): e15684, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39279448

RESUMEN

One nucleotide deletion in codon 15 of HLA-B*40:01:02:01 results in a novel null allele, HLA-B*40:510N.


Asunto(s)
Alelos , Exones , Prueba de Histocompatibilidad , Eliminación de Secuencia , Humanos , Secuencia de Bases , Análisis de Secuencia de ADN/métodos , Antígeno HLA-B40/genética , Codón , Antígenos HLA-B/genética
12.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39226890

RESUMEN

Nanopore selective sequencing allows the targeted sequencing of DNA of interest using computational approaches rather than experimental methods such as targeted multiplex polymerase chain reaction or hybridization capture. Compared to sequence-alignment strategies, deep learning (DL) models for classifying target and nontarget DNA provide large speed advantages. However, the relatively low accuracy of these DL-based tools hinders their application in nanopore selective sequencing. Here, we present a DL-based tool named ReadCurrent for nanopore selective sequencing, which takes electric currents as inputs. ReadCurrent employs a modified very deep convolutional neural network (VDCNN) architecture, enabling significantly lower computational costs for training and quicker inference compared to conventional VDCNN. We evaluated the performance of ReadCurrent across 10 nanopore sequencing datasets spanning human, yeasts, bacteria, and viruses. We observed that ReadCurrent achieved a mean accuracy of 98.57% for classification, outperforming four other DL-based selective sequencing methods. In experimental validation that selectively sequenced microbial DNA from human DNA, ReadCurrent achieved an enrichment ratio of 2.85, which was higher than the 2.7 ratio achieved by MinKNOW using the sequence-alignment strategy. In summary, ReadCurrent can rapidly classify target and nontarget DNA with high accuracy, providing an alternative in the toolbox for nanopore selective sequencing. ReadCurrent is available at https://github.com/Ming-Ni-Group/ReadCurrent.


Asunto(s)
Secuenciación de Nanoporos , Secuenciación de Nanoporos/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Redes Neurales de la Computación , Nanoporos , Programas Informáticos , Aprendizaje Profundo , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
13.
BMC Genomics ; 25(1): 827, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39227777

RESUMEN

BACKGROUND: Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient's cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants. RESULTS: In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs - an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs. CONCLUSIONS: Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.


Asunto(s)
Benchmarking , ADN Tumoral Circulante , Secuenciación de Nucleótidos de Alto Rendimiento , ADN Tumoral Circulante/genética , ADN Tumoral Circulante/sangre , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/sangre , Variación Genética , Neoplasias/genética , Neoplasias/sangre , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sensibilidad y Especificidad
14.
BMC Bioinformatics ; 25(1): 301, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39272021

RESUMEN

Transformer-based large language models (LLMs) are very suited for biological sequence data, because of analogies to natural language. Complex relationships can be learned, because a concept of "words" can be generated through tokenization. Training the models with masked token prediction, they learn both token sequence identity and larger sequence context. We developed methodology to interrogate model learning, which is both relevant for the interpretability of the model and to evaluate its potential for specific tasks. We used DNABERT, a DNA language model trained on the human genome with overlapping k-mers as tokens. To gain insight into the model's learning, we interrogated how the model performs predictions, extracted token embeddings, and defined a fine-tuning benchmarking task to predict the next tokens of different sizes without overlaps. This task evaluates foundation models without interrogating specific genome biology, it does not depend on tokenization strategies, vocabulary size, the dictionary, or the number of training parameters. Lastly, there is no leakage of information from token identity into the prediction task, which makes it particularly useful to evaluate the learning of sequence context. We discovered that the model with overlapping k-mers struggles to learn larger sequence context. Instead, the learned embeddings largely represent token sequence. Still, good performance is achieved for genome-biology-inspired fine-tuning tasks. Models with overlapping tokens may be used for tasks where a larger sequence context is of less relevance, but the token sequence directly represents the desired learning features. This emphasizes the need to interrogate knowledge representation in biological LLMs.


Asunto(s)
ADN , Humanos , ADN/química , Genoma Humano , Análisis de Secuencia de ADN/métodos , Procesamiento de Lenguaje Natural , Biología Computacional/métodos
15.
PLoS One ; 19(9): e0306480, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39264950

RESUMEN

With the rapid development of biotechnology, gene sequencing methods are gradually improved. The structure of gene sequences is also more complex. However, the traditional sequence alignment method is difficult to deal with the complex gene sequence alignment work. In order to improve the efficiency of gene sequence analysis, D2 series method of k-mer statistics is selected to build the model of gene sequence alignment analysis. According to the structure of the foreground sequence, the sequence to be aligned can be cut by different lengths and divided into multiple subsequences. Finally, according to the selected subsequences, the maximum dissimilarity in the alignment results is determined as the statistical result. At the same time, the research also designed an application system for the sequence alignment analysis of the model. The experimental results showed that the statistical power of the sequence alignment analysis model was directly proportional to the sequence coverage and cutting length, and inversely proportional to the K value and module length. At the same time, the model was applied to the system designed in this paper. The maximum storage capacity of the system was 71 GB, the maximum disk capacity was 135 GB, and the running time was less than 2.0s. Therefore, the k-mer statistic sequence alignment model and system proposed in this study have considerable application value in gene alignment analysis.


Asunto(s)
Alineación de Secuencia , Alineación de Secuencia/métodos , Algoritmos , Análisis de Secuencia de ADN/métodos , Modelos Genéticos , Modelos Estadísticos , Biología Computacional/métodos
16.
BMC Genomics ; 25(1): 855, 2024 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-39266973

RESUMEN

BACKGROUND: Studying the composition rules and evolution mechanisms of genome sequences are core issues in the post-genomic era, and k-mer spectrum analysis of genome sequences is an effective means to solve this problem. RESULT: We divided total 8-mers of genome sequences into 16 kinds of XY-type due to XY dinucleotides number in 8-mers. Previous works explored that the independent unimodal distributions observed only in three CG-type 8-mer spectra, while non-CG type 8-mer spectra have not the universal phenomenon from prokaryotes to eukaryotes. On this basis, we analyzed the distribution variation of non-CG type 8-mer spectra across 889 animal genome sequences. Following the evolutionary order of animals from primitive to more complex, we found that the spectrum distributions gradually transition from unimodal to tri-modal. The relative distance from the average frequency of each non-CG type 8-mers to the center frequency is different within a species and among different species. For the 8-mers contain CG dinucleotides, we further divided these into 16 subsets, where each 8-mer contains both CG and XY dinucleotides, called XY1_CG1 subsets. We found that the separability values of XY1_CG1 spectra are closely related to the evolution and specificity of animals. Considering the constraint of Chargaff's second parity rule, we finally obtained 10 separability values as the feature set to characterize the evolution state of genome sequences. In order to verify the rationality of the feature set, we used 14 common classification algorithms to perform binary classification tests. The results showed that the accuracy (Acc) ranged between 98.70% and 83.88% among birds, other vertebrates and mammals. CONCLUSION: We proposed a credible feature set to characterizes the evolution state of genomes and obtained satisfied results by the feature set on large scale classification of animals.


Asunto(s)
Evolución Molecular , Genoma , Animales , Genómica/métodos , Algoritmos , Análisis de Secuencia de ADN/métodos
17.
Curr Protoc ; 4(9): e70003, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39258384

RESUMEN

DNA methylation is well-established as a major epigenetic mechanism that can control gene expression and is involved in both normal development and disease. Analysis of high-throughput-sequencing-based DNA methylation data is a step toward understanding the relationship between disease and phenotype. Analysis of CpG methylation at single-base resolution is routinely done by bisulfite sequencing, in which methylated Cs remain as C while unmethylated Cs are converted to U, subsequently seen as T nucleotides. Sequence reads are aligned to the reference genome using mapping tools that accept the C-T ambiguity. Then, various statistical packages are used to identify differences in methylation between (groups of) samples. We have previously developed the Differential Methylation Analysis Pipeline (DMAP) as an efficient, fast, and flexible tool for this work, both for whole-genome bisulfite sequencing (WGBS) and reduced-representation bisulfite sequencing (RRBS). The protocol described here includes a series of scripts that simplify the use of DMAP tools and that can accommodate the wider range of input formats now in use to perform analysis of whole-genome-scale DNA methylation sequencing data in various biological and clinical contexts. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol: DMAP2 workflow for whole-genome bisulfite sequencing (WGBS) and reduced-representation bisulfite sequencing (RRBS).


Asunto(s)
Metilación de ADN , Sulfitos , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/métodos , Humanos , Sulfitos/química , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Islas de CpG/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA