RESUMEN
The analysis of curated genomic, metagenomic and proteomic data is of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw format on free international repositories, the full access requires lots of computing power and large storage disk space for the domestic user. The purpose of the study is to offer a comprehensive set of microbial genomic and proteomic reference databases in an accessible and easy-to-use form to the scientific community and demonstrate its advantages and usefulness. Also, we present a case study on the applicability of the sketched data, for the determination of overall genomic coherence between two members of the Brucellacea family, which suggests they belong to the same genomospecies that remain as discrete ecotypes. A representative set of genomes, proteomes (from type material), and metagenomes were directly collected from the NCBI Assembly database and Genome Taxonomy Database (GTDB), associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy reduced representations by using the MinHash algorithm implemented in Mash software. The obtained dataset contains more than 133 GB of space disk reduced to 883.25 MB and represents 125,110 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with limited computational resources. Potential uses of these sketched databases are discussed, including but not limited to microbial species delimitation, estimation of genomic distances and genomic novelties, paired comparisons between proteomes, genomes, and metagenomes; phylogenetic neighbor's exploration and selection, among others.
A análise de dados genômicos, metagenômicos e proteômicos com curadoria é de suma importância nos campos da biologia, medicina, educação e bioinformática. Embora esse tipo de dados geralmente seja hospedado em formato bruto em repositórios internacionais gratuitos, o acesso total requer muita capacidade de computação e grande espaço em disco de armazenamento para o usuário doméstico. Os objetivos do estudo são oferecer um conjunto abrangente de bancos de dados de referência genômica e proteômica microbiana de forma acessível e fácil de usar para a comunidade científica e demonstrar suas vantagens e utilidade. Além disso, apresentamos um estudo de caso sobre a aplicabilidade dos dados esboçados para a determinação da coerência genômica geral entre dois membros da família Brucellacea, o que sugere que eles pertencem às mesmas genomoespécies que permanecem como ecótipos discretos. Um conjunto representativo de genomas, proteomas (de material tipo) e metagenomas foi coletado diretamente do banco de dados NCBI Assembly e do banco de dados de taxonomia do genoma (GTDB), associada aos principais grupos de bactérias, Archaea, vírus e fungos. Bancos de dados esboçados foram subsequentemente criados e armazenados em representações reduzidas práticas usando o algoritmo MinHash implementado no software Mash. O conjunto de dados obtido contém mais de 133 GB de espaço em disco reduzido para 883,25 MB e representa 125,110 registros genômicos/proteômicos de oito contextos informativos, que foram pré-filtrados para torná-los acessíveis, utilizáveis ââe amigáveis ââcom recursos computacionais limitados. Os usos potenciais desses bancos de dados esboçados são discutidos, incluindo, mas não se limitando, a delimitação de espécies microbianas, estimativa de distâncias genômicas e novidades genômicas, comparações emparelhadas entre proteomas, genomas e metagenomas, exploração e seleção filogenética de vizinhos, entre outros.
Asunto(s)
Clasificación , Genoma , Genes MicrobianosRESUMEN
The analysis of curated genomic, metagenomic and proteomic data is of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw format on free international repositories, the full access requires lots of computing power and large storage disk space for the domestic user. The purpose of the study is to offer a comprehensive set of microbial genomic and proteomic reference databases in an accessible and easy-to-use form to the scientific community and demonstrate its advantages and usefulness. Also, we present a case study on the applicability of the sketched data, for the determination of overall genomic coherence between two members of the Brucellacea family, which suggests they belong to the same genomospecies that remain as discrete ecotypes. A representative set of genomes, proteomes (from type material), and metagenomes were directly collected from the NCBI Assembly database and Genome Taxonomy Database (GTDB), associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy reduced representations by using the MinHash algorithm implemented in Mash software. The obtained dataset contains more than 133 GB of space disk reduced to 883.25 MB and represents 125,110 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with limited computational resources. Potential uses of these sketched databases are discussed, including but not limited to microbial species delimitation, estimation of genomic distances and genomic novelties, paired comparisons between proteomes, genomes, and metagenomes; phylogenetic neighbor's exploration and selection, among others.
Asunto(s)
Proteoma , Proteómica , Filogenia , Genómica , Biología ComputacionalRESUMEN
The rainforest of French Guiana is still largely unaffected by human activity. Various pristine sites like the Paracou Research Station are devoted to study this tropical ecosystem. We used culture-independent techniques, like polymerase chain reaction-temperature gradient gel electrophoresis, and construction of clone libraries of partial 16S rRNA and nifH genes, to analyze the composition of the bacterial community in the rhizosphere of mature trees of Eperua falcata and Dicorynia guianensis, both species within the Caesalpiniaceae family. E. falcata is one of the more abundant pioneer tree species in this ecosystem and so far, no root nodules have ever been found. However, its nitrogen-fixing status is regarded as "uncertain", whereas D. guianensis is clearly considered a non-nitrogen-fixing plant. The rhizospheres of these mature trees contain specific bacterial communities, including several currently found uncultured microorganisms. In these communities, there are putative nitrogen-fixing bacteria specifically associated to each tree: D. guianensis harbors several Rhizobium spp. and E. falcata members of the genera Burkholderia and Bradyrhizobium. In addition, nifH sequences in the rhizosphere of the latter tree were very diverse. Retrieved sequences were related to bacteria belonging to the alpha-, beta-, and gamma-Proteobacteria in the E. falcata rhizoplane, whereas only two sequences related to gamma-Proteobacteria were found in D. guianensis. Differences in the bacterial communities and the abundance and diversity of nifH sequences in E. falcata rhizosphere suggest that this tree could obtain nitrogen through a nonnodulating bacterial interaction.
Asunto(s)
Bacterias/aislamiento & purificación , Nitrógeno/metabolismo , Árboles/metabolismo , Árboles/microbiología , Bacterias/clasificación , Bacterias/genética , Proteínas Bacterianas/genética , Ecosistema , Guyana Francesa , Oxidorreductasas/genética , Filogenia , Raíces de Plantas/microbiología , ARN Bacteriano/genética , ARN Ribosómico 16S/genética , Especificidad de la Especie , Clima TropicalRESUMEN
Nitrite production by nodules and roots of pea plants (Pisum sativum L., cultivar Alaska) inoculated with Rhizobium leguminosarum strain 3855 has been studied. Nitrate reductase (NR) activity and nitrite reductase (NiR) activity of the bacteroidal and cytosolic fractions of the nodules were also determined, as well as the nitrite content of the nodules cytosol. Nitrite production by nodules and roots from plants treated with 5 mM KNO3 was higher than that of nodules and roots from plants not treated with nitrate, and regardless of the nitrate treatment, nitrite production increased with the incubation period. The presence of nitrate, propanol or both compounds in the incubation mixtures significantly increased the nitrite production by nodules and roots. Nitrite reductase activity was detected in fresh by isolated bacteroids of R. leguminosarum strain 3855, although the presence of nitrate reductase activity could not be detected both in bacteroids of nodules isolated from plants treated or not with 5 mM KNO3. After isolation, when bacteroids were incubated in a mixture with nitrate, nitrate reductase activity developed after incubation for 12 h. Consequently, there was an increase in nitrite reductase activity, which resulted in the disappearance of the nitrite previously accumulated in the incubation medium. Nitrate utilization by bacteroids was not detected until 5 h from the beginning of the incubation period. Since the presence of chloramphenicol or rifampicin in the incubation medium prevented the development of the nitrate reductase activity, such activity was induced in bacteroids. Nitrite content and nitrate reductase and nitrite reductase activities of the cytosol from nodules of pea plants treated or not with 5 mM KNO3 varied with the buffer used for nodules homogenization. However, no nitrite was found when nodules were homogenized with ethanol, what indicates that nitrite accumulation in the cytosol occurs during the homogenization process of the nodules.