Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 12.648
Filtrar
1.
Sci Data ; 11(1): 1074, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358398

RESUMO

Adzuki bean (Vigna angularis) is a significant dietary legume crop that is prevalent in East Asia. It also holds traditional medicinal importance in China. In this study, we report a high-quality, chromosome-level genome assembly of adzuki bean obtained by employing Illumina short-read sequencing, PacBio long-read sequencing, and Hi-C technology. The assembly spans 447.8 Mb, encompassing 96.32% of the estimated genome, with contig and scaffold N50 values of 16.5 and 41.0 Mb, respectively. More than 98.2% of the 1,614 BUSCO genes were fully identified, and 25,939 genes were annotated, with 98.23% of them being functionally identifiable. Vigna angularis was estimated to diverge successively from Vigna unguiculata and Vigna radiata about 15.3 and 8.7 million years ago (Ma), respectively. This chromosome-level reference genome of Vigna angularis provides a robust foundation for exploring the functional genomics and genome evolution of adzuki bean, thereby facilitating advancements in molecular breeding of adzuki bean.


Assuntos
Genoma de Planta , Anotação de Sequência Molecular , Vigna , Vigna/genética , Cromossomos de Plantas
2.
Sci Data ; 11(1): 1072, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358406

RESUMO

Although advances in long-read sequencing technology and genome assembly techniques have facilitated the study of genomes, little is known about the genomes of unique Chinese indigenous breeds, including the Huai pig. Huai pig is an ancient domestic pig breed and is well-documented for its redder meat color and high forage tolerance compared to European domestic pigs. In the present study, we sequenced and assembled the Huai pig genome using PacBio, Hi-C, and Illumina sequencing technologies. The final highly contiguous chromosome-level Huai pig genome spans 2.53 Gb with a scaffold N50 of 138.92 Mb. The Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness score for the assembled genome was 95.33%. Remarkably, 23,389 protein-coding genes were annotated in the Huai-pig genome, along with 45.87% repetitive sequences. Overall, this study provided new foundational resources for future genetic research on Chinese domestic pigs.


Assuntos
Genoma , Sus scrofa , Animais , Sus scrofa/genética , Suínos/genética , Anotação de Sequência Molecular
3.
Sci Data ; 11(1): 1071, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358417

RESUMO

Astragalus membranaceus (Fisch.) Bge (AM) is a medicinal herb plant belonging to the Leguminosae family. In this study, we present a chromosome-scale genome assembly of AM, aiming to enhance the molecular biology and functional studies of Astragali Radix. The genome size of AM is about 1.43 Gb, with a contig N50 value of 1.67 Mb. A total of 98.16% of the assembly anchored to 9 pseudochromosomes using Hi-C technology. The assembly completeness was estimated to be 97.27% using BUSCO with the long terminal repeat assembly index (LAI) of 16.22 and quality value (QV) of 48.58. Additionally, the genome contained 67.98% repetitive sequences. Genome annotation predicted 29,914 protein-coding genes, including 73 genes involved in the flavonoid biosynthetic pathway and 2,048 transcription factors. The high-quality genome assembly and gene annotation resources will greatly facilitate future functional genomic studies in Leguminosae species.


Assuntos
Astragalus propinquus , Genoma de Planta , Astragalus propinquus/genética , Anotação de Sequência Molecular , Cromossomos de Plantas , Plantas Medicinais/genética
4.
F1000Res ; 13: 640, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39360247

RESUMO

Background: Building Metagenome-Assembled Genomes (MAGs) from highly complex metagenomics datasets encompasses a series of steps covering from cleaning the sequences, assembling them to finally group them into bins. Along the process, multiple tools aimed to assess the quality and integrity of each MAG are implemented. Nonetheless, even when incorporated within end-to-end pipelines, the outputs of these pieces of software must be visualized and analyzed manually lacking integration in a complete framework. Methods: We developed a Nextflow pipeline (MAGFlow) for estimating the quality of MAGs through a wide variety of approaches (BUSCO, CheckM2, GUNC and QUAST), as well as for annotating taxonomically the metagenomes using GTDB-Tk2. MAGFlow is coupled to a Python-Dash application (BIgMAG) that displays the concatenated outcomes from the tools included by MAGFlow, highlighting the most important metrics in a single interactive environment along with a comparison/clustering of the input data. Results: By using MAGFlow/BIgMAG, the user will be able to benchmark the MAGs obtained through different workflows or establish the quality of the MAGs belonging to different samples following the divide and rule methodology. Conclusions: MAGFlow/BIgMAG represents a unique tool that integrates state-of-the-art tools to study different quality metrics and extract visually as much information as possible from a wide range of genome features.


Assuntos
Metagenoma , Software , Metagenômica/métodos , Anotação de Sequência Molecular/métodos
5.
World J Microbiol Biotechnol ; 40(11): 332, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39358614

RESUMO

Brevibacillus brevis FJAT-0809-GLX has a broad spectrum of antimicrobial activity. Understanding the molecular basis of biocontrol ability of B. brevis will allow us to develop effective microbial agents for sustainable agriculture. In this study, we present the complete and annotated genome sequence of FJAT-0809-GLX. The complete genome size of B. brevis FJAT-0809-GLX was 6,137,019 bp, with 5688 predicted coding sequences (CDS). The average GC content of 47.38%, and there were 44 copies of the rRNAs operon (16S, 23S and 5S RNA), and 127 tRNA genes. A total of 11,162 genes were functionally annotated with the COG, GO, and KEGG databases, and 123 genes belonged to CAZymes. Genomic secondary metabolite analysis indicated 13 clusters encoding potential new antimicrobials. FJAT-0809-GLX was designated as B. brevis according to average nucleotide polymorphism (ANI) and phylogenetic analysis. The pangenome consisted of 7141 homologous genes, and 4469 homologous genes shared by B. brevis FJAT-0809-GLX, B. brevis NBRC100599, B. brevis DSM30, and B. brevis NCTC2611. The number of unique homologous genes of B. brevis FJAT-0809-GLX (419 genes) and B. brevis NBRC100599 (480 genes) were much more than those in B. brevis DSM30 (13 genes), and B. brevis NCTC2611 (6 genes). Nine gene clusters encoding for secondary metabolite biosynthesis were compared in the genome of B. brevis FJAT-0809-GLX with those of B. brevis NBRC100599, B. brevis DSM30 and B. brevis NCTC2611, and the gene clusters encoding for lantipeptide and transatpks-otherks only existed in genome of B. brevis FJAT-0809-GLX. The 11 BbPks genes were included in the B. brevis FJAT-0809-GLX genome, which contained the conserved PS-DH domain. The relative expression of BbPksL, BbPksM2, BbPksM3, BbPksN3, BbPksN4 and BbPksN5 reached a maximum at 120 h and then decreased at 144 h. Our results provided detailed genomic and Pks genes information for the FJAT-0809-GLX strain, and lid a foundation for studying its biocontrol mechanisms.


Assuntos
Composição de Bases , Brevibacillus , Genoma Bacteriano , Filogenia , Brevibacillus/genética , Sequenciamento Completo do Genoma , Policetídeo Sintases/genética , Família Multigênica , Anotação de Sequência Molecular , Doenças das Plantas/microbiologia , Metabolismo Secundário/genética , Análise de Sequência de DNA , DNA Bacteriano/genética
6.
Bioinformatics ; 40(Suppl 2): ii53-ii61, 2024 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-39230707

RESUMO

SUMMARY: The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023-2024. AVAILABILITY AND IMPLEMENTATION: GORetriever is publicly available at https://github.com/ZhuLab-Fudan/GORetriever.


Assuntos
Ontologia Genética , Anotação de Sequência Molecular , Proteínas , Proteínas/química , Proteínas/metabolismo , Anotação de Sequência Molecular/métodos , Bases de Dados de Proteínas , Software , Biologia Computacional/métodos
7.
Nat Commun ; 15(1): 7748, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237506

RESUMO

Evolutionary annotation of genome maintenance (GM) proteins has conventionally been established by remote relationships within protein sequence databases. However, often no significant relationship can be established. Highly sensitive approaches to attain remote homologies based on iterative profile-to-profile methods have been developed. Still, these methods have not been systematically applied in the evolutionary annotation of GM proteins. Here, by applying profile-to-profile models, we systematically survey the repertoire of GM proteins from bacteria to man. We identify multiple GM protein candidates and annotate domains in numerous established GM proteins, among other PARP, OB-fold, Macro, TUDOR, SAP, BRCT, KU, MYB (SANT), and nuclease domains. We experimentally validate OB-fold and MIS18 (Yippee) domains in SPIDR and FAM72 protein families, respectively. Our results indicate that, surprisingly, despite the immense interest and long-term research efforts, the repertoire of genome stability caretakers is still not fully appreciated.


Assuntos
Domínios Proteicos , Humanos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Proteínas de Ligação a DNA/genética , Instabilidade Genômica , Evolução Molecular , DNA/química , DNA/metabolismo , Bases de Dados de Proteínas , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Modelos Moleculares , Anotação de Sequência Molecular , Bactérias/genética , Bactérias/metabolismo
8.
Database (Oxford) ; 20242024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39241109

RESUMO

Automated annotations of protein functions are error-prone because of our lack of knowledge of protein functions. For example, it is often impossible to predict the correct substrate for an enzyme or a transporter. Furthermore, much of the knowledge that we do have about the functions of proteins is missing from the underlying databases. We discuss how to use interactive tools to quickly find different kinds of information relevant to a protein's function. Many of these tools are available via PaperBLAST (http://papers.genomics.lbl.gov). Combining these tools often allows us to infer a protein's function. Ideally, accurate annotations would allow us to predict a bacterium's capabilities from its genome sequence, but in practice, this remains challenging. We describe interactive tools that infer potential capabilities from a genome sequence or that search a genome to find proteins that might perform a specific function of interest. Database URL: http://papers.genomics.lbl.gov.


Assuntos
Genoma Bacteriano , Anotação de Sequência Molecular , Anotação de Sequência Molecular/métodos , Software , Bases de Dados Genéticas , Proteínas de Bactérias/genética , Interface Usuário-Computador , Bases de Dados de Proteínas
9.
Database (Oxford) ; 20242024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-39308247

RESUMO

Peptihub (https://bioinformaticscollege.ir/peptihub/) is a meticulously curated repository of cancer-related peptides (CRPs) that have been documented in scientific literature. A diverse collection of CRPs is included in the PeptiHub, showcasing a spectrum of effects and activities. While some peptides demonstrated significant anticancer efficacy, others exhibited no discernible impact, and some even possessed alternative non-drug functionalities, including drug carrier or carcinogenic attributes. Presently, Peptihub houses 874 CRPs, subjected to evaluation across 10 distinct organism categories, 26 organs, and 438 cell lines. Each entry in the database is accompanied by easily accessible 3D conformations, obtained either experimentally or through predictive methodology. Users are provided with three search frameworks offering basic, advanced, and BLAST sequence search options. Furthermore, precise annotations of peptides enable users to explore CRPs based on their specific activities (anticancer, no effect, insignificant effect, carcinogen, and others) and their effectiveness (rate and IC50) under cancer conditions, specifically within individual organs. This unique property facilitates the construction of robust training and testing datasets. Additionally, PeptiHub offers 1141 features with the convenience of selecting the most pertinent features to address their specific research questions. Features include aaindex1 (in six main subcategories: alpha propensities, beta propensity, composition indices, hydrophobicity, physicochemical properties, and other properties), amino acid composition (Amino acid Composition and Dipeptide Composition), and Grouped Amino Acid Composition (Grouped amino acid composition, Grouped dipeptide composition, and Conjoint triad) categories. These utilities not only speed up machine learning-based peptide design but also facilitate peptide classification. Database URL: https://bioinformaticscollege.ir/peptihub/.


Assuntos
Bases de Dados de Proteínas , Neoplasias , Peptídeos , Humanos , Peptídeos/química , Neoplasias/metabolismo , Anotação de Sequência Molecular , Antineoplásicos/química , Antineoplásicos/uso terapêutico , Antineoplásicos/farmacologia
10.
F1000Res ; 13: 251, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39301273

RESUMO

The swift parrot ( Lathamus discolor) is a Critically Endangered migratory parrot that breeds in Tasmania and winters on the Australian mainland. Here we provide a reference genome assembly for the swift parrot. We sequence PacBio HiFi reads to create a high-quality reference assembly and identify a complete mitochondrial sequence. We also generate a reference transcriptome from five organs to inform genome annotation. The genome was 1.24 Gb in length and consisted of 847 contigs with a contig N50 of 18.97 Gb and L50 of 20 contigs. This study provides an annotated reference assembly and transcriptomic resources for the swift parrot to assist in future conservation genomic research.


Assuntos
Espécies em Perigo de Extinção , Genoma Mitocondrial , Papagaios , Transcriptoma , Animais , Papagaios/genética , Genoma/genética , Anotação de Sequência Molecular
11.
Sci Data ; 11(1): 1010, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39294198

RESUMO

Maruca vitrata, a significant pest of legumes, impacts food security in Asia and Africa. This study presents a high-quality genome assembly of M. vitrata, utilizing advanced sequencing technologies including Nanopore long-read, MGI short-read, and Hi-C. The genome, totaling 482.3 Mb with a contig N50 of 2.91 Mb, features 41.58% repetitive sequences and encompasses 13,320 protein-coding genes. We performed comparative genomic analyses to affirm the accuracy and completeness of the protein sequences assembled, ensuring the assembly's integrity. Additionally, the annotation of 83 Cytochrome P450 (CYP) genes further confirms the comprehensive nature of the genome assembly and its annotations. This genome assembly not only deepens our understanding of M. vitrata biology but also supports the development of sustainable pest management strategies. This research highlights the importance of genomics in advancing sustainable agricultural solutions through innovative pest management approaches.


Assuntos
Genoma de Inseto , Animais , Mariposas/genética , Anotação de Sequência Molecular , Fabaceae/genética
12.
Database (Oxford) ; 20242024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39297389

RESUMO

Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLu ANnotation tool (FLAN) has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions and has been publicly available as a webserver but not as a standalone tool. Viral Annotation DefineR (VADR) is a general sequence validation and annotation software package used by GenBank for norovirus, dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree, VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use. Database URL: https://bitbucket.org/nawrockie/vadr-models-flu.


Assuntos
Anotação de Sequência Molecular , Software , Humanos , Anotação de Sequência Molecular/métodos , Orthomyxoviridae/genética , Influenza Humana/virologia , Influenza Humana/genética , Bases de Dados de Ácidos Nucleicos
13.
Int J Mol Sci ; 25(18)2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39337484

RESUMO

This study describes the first genome sequence and analysis of Coniella granati, a fungal pathogen with a broad host range, which is responsible for postharvest crown rot, shoot blight, and canker diseases in pomegranates. C. granati is a geographically widespread pathogen which has been reported across Europe, Asia, the Americas, and Africa. Our analysis revealed a 46.8 Mb genome with features characteristic of hemibiotrophic fungi. Approximately one third of its genome was compartmentalised within 'AT-rich' regions exhibiting a low GC content (30 to 45%). These regions primarily comprised transposable elements that are repeated at a high frequency and interspersed throughout the genome. Transcriptome-supported gene annotation of the C. granati genome revealed a streamlined proteome, mirroring similar observations in other pathogens with a latent phase. The genome encoded a relatively compact set of 9568 protein-coding genes with a remarkable 95% having assigned functional annotations. Despite this streamlined nature, a set of 40 cysteine-rich candidate secreted effector-like proteins (CSEPs) was predicted as well as a gene cluster involved in the synthesis of a pomegranate-associated toxin. These potential virulence factors were predominantly located near repeat-rich and AT-rich regions, suggesting that the pathogen evades host defences through Repeat-Induced Point mutation (RIP)-mediated pseudogenisation. Furthermore, 23 of these CSEPs exhibited homology to known effector and pathogenicity genes found in other hemibiotrophic pathogens. The study establishes a foundational resource for the study of the genetic makeup of C. granati, paving the way for future research on its pathogenicity mechanisms and the development of targeted control strategies to safeguard pomegranate production.


Assuntos
Proteínas Fúngicas , Genoma Fúngico , Doenças das Plantas , Punica granatum , Proteoma , Doenças das Plantas/microbiologia , Doenças das Plantas/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Punica granatum/genética , Punica granatum/microbiologia , Ascomicetos/genética , Ascomicetos/patogenicidade , Anotação de Sequência Molecular , Frutas/microbiologia , Frutas/genética , Sequências Repetitivas de Ácido Nucleico/genética
14.
Plant Mol Biol ; 114(5): 102, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39316221

RESUMO

Australian wild limes occur in highly diverse range of environments and are a unique genetic resource within the genus Citrus. Here we compare the haplotype-resolved genome assemblies of six Australian native limes, including four new assemblies generated using PacBio HiFi and Hi-C sequencing data. The size of the genomes was between 315 and 391 Mb with contig N50s from 29.5 to 35 Mb. Gene completeness of the assemblies was estimated to be from 98.4 to 99.3% and the annotations from 97.7 to 98.9% based upon BUSCO, confirming the high contiguity and completeness of the assembled genomes. High collinearity was observed among the genomes and the two haplotype assemblies for each species. Gene duplication and evolutionary analysis demonstrated that the Australian citrus have undergone only one ancient whole-genome triplication event during evolution. The highest number of species-specific and expanded gene families were found in C. glauca and they were primarily enriched in purine, thiamine metabolism, amino acids and aromatic amino acids metabolism which might help C. glauca to mitigate drought, salinity, and pathogen attacks in the drier environments in which this species is found. Unique genes related to terpene biosynthesis, glutathione metabolism, and toll-like receptors in C. australasica, and starch and sucrose metabolism genes in both C. australis and C. australasica might be important candidate genes for HLB tolerance in these species. Expanded gene families were not lineage specific, however, a greater number of genes related to plant-pathogen interactions, predominantly disease resistant protein, was found in C. australasica and C. australis.


Assuntos
Citrus , Genoma de Planta , Genoma de Planta/genética , Austrália , Citrus/genética , Filogenia , Anotação de Sequência Molecular , Haplótipos , Duplicação Gênica , Evolução Molecular , Especificidade da Espécie
15.
Physiol Plant ; 176(5): e14537, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39319989

RESUMO

Long non-coding RNAs (lncRNAs) have emerged as important regulators of many biological processes, although their regulatory roles remain poorly characterized in woody plants, especially in gymnosperms. A major challenge of working with lncRNAs is to assign functional annotations, since they have a low coding potential and low cross-species conservation. We utilised an existing RNA-Sequencing resource and performed short RNA sequencing of somatic embryogenesis developmental stages in Norway spruce (Picea abies L. Karst). We implemented a pipeline to identify lncRNAs located within the intergenic space (lincRNAs) and generated a co-expression network including protein coding, lincRNA and miRNA genes. To assign putative functional annotation, we employed a guilt-by-association approach using the co-expression network and integrated these results with annotation assigned using semantic similarity and co-expression. Moreover, we evaluated the relationship between lincRNAs and miRNAs, and identified which lincRNAs are conserved in other species. We identified lincRNAs with clear evidence of differential expression during somatic embryogenesis and used network connectivity to identify those with the greatest regulatory potential. This work provides the most comprehensive view of lincRNAs in Norway spruce and is the first study to perform global identification of lincRNAs during somatic embryogenesis in conifers. The data have been integrated into the expression visualisation tools at the PlantGenIE.org web resource to enable easy access to the community. This will facilitate the use of the data to address novel questions about the role of lincRNAs in the regulation of embryogenesis and facilitate future comparative genomics studies.


Assuntos
Regulação da Expressão Gênica de Plantas , Picea , RNA Longo não Codificante , Picea/genética , Picea/embriologia , Picea/crescimento & desenvolvimento , Regulação da Expressão Gênica de Plantas/genética , RNA Longo não Codificante/genética , MicroRNAs/genética , Técnicas de Embriogênese Somática de Plantas/métodos , RNA de Plantas/genética , Anotação de Sequência Molecular , Redes Reguladoras de Genes/genética
16.
Genes (Basel) ; 15(9)2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39336702

RESUMO

Vairimorpha (Nosema) ceranae is a single-cellular fungus that obligately infects the midgut epithelial cells of adult honeybees, causing bee microsporidiosis and jeopardizing bee health and production. This work aims to construct the full-length transcriptome of V. ceranae and conduct a relevant investigation using PacBio single-molecule real-time (SMRT) sequencing technology. Following PacBio SMRT sequencing, 41,950 circular consensus (CCS) were generated, and 25,068 full-length non-chimeric (FLNC) reads were then detected. After polishing, 4387 high-quality, full-length transcripts were gained. There are 778, 2083, 1202, 1559, 1457, 1232, 1702, and 3896 full-length transcripts that could be annotated to COG, GO, KEGG, KOG, Pfam, Swiss-Prot, eggNOG, and Nr databases, respectively. Additionally, 11 alternative splicing (AS) events occurred in 6 genes were identified, including 1 alternative 5' splice-site and 10 intron retention. The structures of 225 annotated genes in the V. ceranae reference genome were optimized, of which 29 genes were extended at both 5' UTR and 3' UTR, while 90 and 106 genes were, respectively, extended at the 5' UTR as well as 3' UTR. Furthermore, a total of 29 high-confidence lncRNAs were obtained, including 12 sense-lncRNAs, 10 lincRNAs, and 7 antisense-lncRNAs. Taken together, the high-quality, full-length transcriptome of V. ceranae was constructed and annotated, the structures of annotated genes in the V. ceranae reference genome were improved, and abundant new genes, transcripts, and lncRNAs were discovered. Findings from this current work offer a valuable resource and a crucial foundation for molecular and omics research on V. ceranae.


Assuntos
Processamento Alternativo , Nosema , Transcriptoma , Fatores de Virulência , Transcriptoma/genética , Fatores de Virulência/genética , Nosema/genética , Nosema/patogenicidade , Animais , Abelhas/microbiologia , Abelhas/genética , Isoformas de Proteínas/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Anotação de Sequência Molecular
17.
Sci Data ; 11(1): 1041, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39333133

RESUMO

Pygmy jerboas are one of the smallest taxa of rodents. They exhibit distinctly different morphological and biological characteristics from other subfamilies, such as more restricted distribution, species richness, reproductive ability, and population size. Agricultural expansion and the development of new energy projects in recent years lead to sharp decline of their natural populations. Here, we assembled and annotated the first reference genome for the subfamily Cardiocraniinae using Illunima and Nanopore sequencing from the thick-tailed pygmy jerboa, Salpingotus crassicauda. The final genome is 2.44 Gb in size, with a contig N50 length of 13.71 Mb and a BUSCO completeness of 96.35%. A total of 23,344 protein-coding genes were annotated in the final genome. We also determined the mitochondrial genome of this species and annotated 13 protein-coding genes, 22 tRNAs, and 2 rRNA. These genomic assemblies provide resources in studying phylogeny and adaptive evolution of Dipodidae, as well as implementing conservation management of jerboas.


Assuntos
Genoma Mitocondrial , Genoma , Animais , Roedores/genética , Roedores/classificação , Filogenia , Anotação de Sequência Molecular , RNA de Transferência/genética
18.
Sci Rep ; 14(1): 22308, 2024 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-39333739

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a key technology for investigating cell development and analysing cell diversity across various diseases. However, the high dimensionality and extreme sparsity of scRNA-seq data pose great challenges for accurate cell type annotation. To address this, we developed a new cell-type annotation model called scGAA (general gated axial-attention model for accurate cell-type annotation of scRNA-seq). Based on the transformer framework, the model decomposes the traditional self-attention mechanism into horizontal and vertical attention, considerably improving computational efficiency. This axial attention mechanism can process high-dimensional data more efficiently while maintaining reasonable model complexity. Additionally, the gated unit was integrated into the model to enhance the capture of relationships between genes, which is crucial for achieving an accurate cell type annotation. The results revealed that our improved transformer model is a promising tool for practical applications. This theoretical innovation increased the model performance and provided new insights into analytical tools for scRNA-seq data.


Assuntos
RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Humanos , Análise de Sequência de RNA/métodos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Algoritmos , Análise da Expressão Gênica de Célula Única
19.
BMC Genomics ; 25(1): 861, 2024 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-39277723

RESUMO

BACKGROUND: Black spot disease in tree peony caused by the fungal necrotroph A. alternata, is a primary limiting factor in the production of the tree peony. The intricate molecular mechanisms underlying the tree peony resistance to A. alternata have not been thoroughly investigated. RESULTS: The present study utilized high-throughput RNA sequencing (RNA-seq) technology to conduct global expression profiling, revealing an intricate network of genes implicated in the interaction between tree peony and A. alternata. RNA-Seq libraries were constructed from leaf samples and high-throughput sequenced using the BGISEQ-500 sequencing platform. Six distinct libraries were characterized. M1, M2 and M3 were derived from leaves that had undergone mock inoculation, while I1, I2 and I3 originated from leaves that had been inoculated with the pathogen. A range of 10.22-11.80 gigabases (Gb) of clean bases were generated, comprising 68,131,232 - 78,633,602 clean bases and 56,677 - 68,996 Unigenes. A grand total of 99,721 Unigenes were acquired, boasting a mean length of 1,266 base pairs. All these 99,721 Unigenes were annotated in various databases, including NR (Non-Redundant, 61.99%), NT (Nucleotide, 45.50%), SwissProt (46.32%), KEGG (Kyoto Encyclopedia of Genes and Genomes, 49.33%), KOG (clusters of euKaryotic Orthologous Groups, 50.18%), Pfam (Protein family, 47.16%), and GO (Gene Ontology, 34.86%). In total, 66,641 (66.83%) Unigenes had matches in at least one database. By conducting a comparative transcriptome analysis of the mock- and A. alternata-infected sample libraries, we found differentially expressed genes (DEGs) that are related to phytohormone signalling, pathogen recognition, active oxygen generation, and circadian rhythm regulation. Furthermore, multiple different kinds of transcription factors were identified. The expression levels of 10 selected genes were validated employing qRT-PCR (quantitative real-time PCR) to confirm RNA-Seq data. CONCLUSIONS: A multitude of transcriptome sequences have been generated, thus offering a valuable genetic repository for further scholarly exploration on the immune mechanisms underlying the tree peony infected by A. alternata. While the expression of most DEGs increased, a few DEGs showed decreased expression.


Assuntos
Alternaria , Perfilação da Expressão Gênica , Paeonia , Doenças das Plantas , Paeonia/genética , Paeonia/microbiologia , Doenças das Plantas/microbiologia , Doenças das Plantas/genética , Alternaria/genética , Transcriptoma , Sequenciamento de Nucleotídeos em Larga Escala , Regulação da Expressão Gênica de Plantas , Anotação de Sequência Molecular , Ontologia Genética
20.
Biomed Khim ; 70(5): 315-328, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39324196

RESUMO

The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.


Assuntos
Processamento Alternativo , Isoformas de Proteínas , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Simulação por Computador , Anotação de Sequência Molecular , Biologia Computacional/métodos , Splicing de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA