Pesquisa | Portal Regional da BVS

1.

Explainable artificial intelligence as a reliable annotator of archaeal promoter regions.

Sganzerla Martinez, Gustavo; Perez-Rueda, Ernesto; Kumar, Aditya; Sarkar, Sharmilee; de Avila E Silva, Scheila.

Sci Rep ; 13(1): 1763, 2023 01 31.

Artigo em Inglês | MEDLINE | ID: mdl-36720898

RESUMO

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Archaea/genética , Regiões Promotoras Genéticas , Fatores de Transcrição/genética

2.

Taxonomy, comparative genomics and evolutionary insights of Penicillium ucsense: a novel species in series Oxalica.

Lenz, Alexandre Rafael; Balbinot, Eduardo; de Abreu, Fernanda Pessi; de Oliveira, Nikael Souza; Fontana, Roselei Claudete; de Avila E Silva, Scheila; Park, Myung Soo; Lim, Young Woon; Houbraken, Jos; Camassola, Marli; Dillon, Aldo José Pinheiro.

Antonie Van Leeuwenhoek ; 115(8): 1009-1029, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-35678932

RESUMO

The genomes of two Penicillium strains were sequenced and studied in this study: strain 2HH was isolated from the digestive tract of Anobium punctatum beetle larva in 1979 and the cellulase hypersecretory strain S1M29, derived from strain 2HH by a long-term mutagenesis process. With these data, the strains were reclassified and insight is obtained on molecular features related to cellulase hyperproduction and the albino phenotype of the mutant. Both strains were previously identified as Penicillium echinulatum and this investigation indicated that these should be reclassified. Phylogenetic and phenotype data showed that these strains represent a new Penicillium species in series Oxalica, for which the name Penicillium ucsense is proposed here. Six additional strains (SFC101850, SFCP10873, SFCP10886, SFCP10931, SFCP10932 and SFCP10933) collected from the marine environment in the Republic of Korea were also classified as this species, indicating a worldwide distribution of this new taxon. Compared to the closely related strain Penicillium oxalicum 114-2, the composition of cell wall-associated proteins of P. ucsense 2HH shows five fewer chitinases, considerable differences in the number of proteins related to ß-D-glucan metabolism. The genomic comparison of 2HH and S1M29 highlighted single amino-acid substitutions in two major proteins (BGL2 and FlbA) that can be associated with the hyperproduction of cellulases. The study of melanin pathways shows that the S1M29 albino phenotype resulted from a single amino-acid substitution in the enzyme ALB1, a precursor of the 1,8-dihydroxynaphthalene (DHN)-melanin biosynthesis. Our study provides important knowledge towards understanding species distribution, molecular mechanisms, melanin production and cell wall biosynthesis of this new Penicillium species.

Assuntos

Celulase , Penicillium , Celulase/genética , Genômica , Melaninas/metabolismo , Penicillium/genética , Filogenia

3.

Machine learning and statistics shape a novel path in archaeal promoter annotation.

Martinez, Gustavo Sganzerla; Pérez-Rueda, Ernesto; Sarkar, Sharmilee; Kumar, Aditya; de Ávila E Silva, Scheila.

BMC Bioinformatics ; 23(1): 171, 2022 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-35538405

RESUMO

BACKGROUND: Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. RESULTS AND DISCUSSIONS: In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. CONCLUDING REMARKS: The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.

Assuntos

Archaea , Proteínas Arqueais , Archaea/genética , Proteínas Arqueais/química , Proteínas Arqueais/genética , Aprendizado de Máquina , Regiões Promotoras Genéticas , Transcrição Gênica

4.

A Survey of Biological Data in a Big Data Perspective.

Dall'Alba, Gabriel; Casa, Pedro Lenz; Abreu, Fernanda Pessi de; Notari, Daniel Luis; de Avila E Silva, Scheila.

Big Data ; 10(4): 279-297, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35394342

RESUMO

The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.

Assuntos

Big Data , Mineração de Dados , Computação em Nuvem , Mineração de Dados/métodos , Aprendizado de Máquina , Redes Neurais de Computação

5.

Analysis of carbohydrate-active enzymes and sugar transporters in Penicillium echinulatum: A genome-wide comparative study of the fungal lignocellulolytic system.

Lenz, Alexandre Rafael; Balbinot, Eduardo; Souza de Oliveira, Nikael; Abreu, Fernanda Pessi de; Casa, Pedro Lenz; Camassola, Marli; Perez-Rueda, Ernesto; de Avila E Silva, Scheila; Dillon, Aldo José Pinheiro.

Gene ; 822: 146345, 2022 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-35189252

RESUMO

Penicillium echinulatum 2HH is an ascomycete well known for its production of cellulolytic enzymes. Understanding lignocellulolytic and sugar uptake systems is essential to obtain efficient fungi strains for the production of bioethanol. In this study we performed a genome-wide functional annotation of carbohydrate-active enzymes and sugar transporters involved in the lignocellulolytic system of P. echinulatum 2HH and S1M29 strains (wildtype and mutant, respectively) and eleven related fungi. Additionally, signal peptide and orthology prediction were carried out. We encountered a diverse assortment of cellulolytic enzymes in P. echinulatum, especially in terms of ß-glucosidases and endoglucanases. Other enzymes required for the breakdown of cellulosic biomass were also found, including cellobiohydrolases, lytic cellulose monooxygenases and cellobiose dehydrogenases. The S1M29 mutant, which is known to produce an increased cellulase activity, and the 2HH wild type strain of P. echinulatum did not show significant differences between their enzymatic repertoire. Nevertheless, we unveiled an amino acid substitution for a predicted intracellular ß-glucosidase of the mutant, which might contribute to hyperexpression of cellulases through a cellodextrin induction pathway. Most of the P. echinulatum enzymes presented orthologs in P. oxalicum 114-2, supporting the presence of highly similar cellulolytic mechanisms and a close phylogenetic relationship between these fungi. A phylogenetic analysis of intracellular ß-glucosidases and sugar transporters allowed us to identify several proteins potentially involved in the accumulation of intracellular cellodextrins. These may prove valuable targets in the genetic engineering of P. echinulatum focused on industrial cellulases production. Our study marks an important step in characterizing and understanding the molecular mechanisms employed by P. echinulatum in the enzymatic hydrolysis of lignocellulosic biomass.

Assuntos

Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Lignina/metabolismo , Penicillium/metabolismo , Substituição de Aminoácidos , Transporte Biológico , Metabolismo dos Carboidratos , Celulose/análogos & derivados , Dextrinas , Regulação Fúngica da Expressão Gênica , Anotação de Sequência Molecular , Penicillium/genética , Filogenia , Açúcares/metabolismo

6.

Characterization of promoters in archaeal genomes based on DNA structural parameters.

Martinez, Gustavo Sganzerla; Sarkar, Sharmilee; Kumar, Aditya; Pérez-Rueda, Ernesto; de Avila E Silva, Scheila.

Microbiologyopen ; 10(5): e1230, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34713600

RESUMO

The transcription machinery of archaea can be roughly classified as a simplified version of eukaryotic organisms. The basal transcription factor machinery binds to the TATA box found around 28 nucleotides upstream of the transcription start site; however, some transcription units lack a clear TATA box and still have TBP/TFB binding over them. This apparent absence of conserved sequences could be a consequence of sequence divergence associated with the upstream region, operon, and gene organization. Furthermore, earlier studies have found that a structural analysis gains more information compared with a simple sequence inspection. In this work, we evaluated and coded 3630 archaeal promoter sequences of three organisms, Haloferax volcanii, Thermococcus kodakarensis, and Sulfolobus solfataricus into DNA duplex stability, enthalpy, curvature, and bendability parameters. We also split our dataset into conserved TATA and degenerated TATA promoters to identify differences among these two classes of promoters. The structural analysis reveals variations in archaeal promoter architecture, that is, a distinctive signal is observed in the TFB, TBP, and TFE binding sites independently of these being TATA-conserved or TATA-degenerated. In addition, the promoter encountering method was validated with upstream regions of 13 other archaea, suggesting that there might be promoter sequences among them. Therefore, we suggest a novel method for locating promoters within the genome of archaea based on DNA energetic/structural features.

Assuntos

Archaea/genética , DNA Arqueal , Genoma Arqueal , Conformação de Ácido Nucleico , Regiões Promotoras Genéticas , TATA Box , Sequência de Bases , Biologia Computacional/métodos , Ligação Proteica , Sítio de Iniciação de Transcrição , Transcrição Gênica

7.

Gene Regulatory Networks of Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 Inferred by a Computational Biology Approach.

Lenz, Alexandre Rafael; Galán-Vásquez, Edgardo; Balbinot, Eduardo; de Abreu, Fernanda Pessi; Souza de Oliveira, Nikael; da Rosa, Letícia Osório; de Avila E Silva, Scheila; Camassola, Marli; Dillon, Aldo José Pinheiro; Perez-Rueda, Ernesto.

Front Microbiol ; 11: 588263, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33193246

RESUMO

Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 are well-known cellulase fungal producers. However, few studies addressing global mechanisms for gene regulation of these two important organisms are available so far. A recent finding that the 2HH wild-type is closely related to P. oxalicum leads to a combined study of these two species. Firstly, we provide a global gene regulatory network for P. echinulatum 2HH and P. oxalicum 114-2, based on TF-TG orthology relationships, considering three related species with well-known regulatory interactions combined with TFBSs prediction. The network was then analyzed in terms of topology, identifying TFs as hubs, and modules. Based on this approach, we explore numerous identified modules, such as the expression of cellulolytic and xylanolytic systems, where XlnR plays a key role in positive regulation of the xylanolytic system. It also regulates positively the cellulolytic system by acting indirectly through the cellodextrin induction system. This remarkable finding suggests that the XlnR-dependent cellulolytic and xylanolytic regulatory systems are probably conserved in both P. echinulatum and P. oxalicum. Finally, we explore the functional congruency on the genes clustered in terms of communities, where the genes related to cellular nitrogen, compound metabolic process and macromolecule metabolic process were the most abundant. Therefore, our approach allows us to confer a degree of accuracy regarding the existence of each inferred interaction.

8.

Toward Algorithms for Automation of Postgenomic Data Analyses: Bacillus subtilis Promoter Prediction with Artificial Neural Network.

Coelho, Rafael Vieira; Dall'Alba, Gabriel; de Avila E Silva, Scheila; Echeverrigaray, Sergio; Delamare, Ana Paula Longaray.

OMICS ; 24(5): 300-309, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-31573385

RESUMO

In the present postgenomic era, the capacity to generate big data has far exceeded the capacity to analyze, contextualize, and make sense of the data in clinical, biological, and ecological applications. There is a great unmet need for automation and algorithms to aid in analyses of big data, in biology in particular. In this context, it is noteworthy that computational methods used to analyze the regulation of bacterial gene expression have in the past focused mainly on Escherichia coli promoters due to the large amount of data available. The challenge and prospects of automation in prediction and recognition of bacteria sequences as promoters have not been properly addressed due to the promoter size and degenerate pattern. We report here an original neural network approach for recognition and prediction of Bacillus subtilis promoters. The artificial neural network used as input 767 B. subtilis promoter sequences, while also aiming at identifying the architecture, provides the most optimal prediction. Two multilayer perceptron neural network architectures offered the highest accuracy: one with five, and another with seven neurons in the hidden layer. Each architecture achieved an accuracy of 98.57% and 97.69%, respectively. The results collectively indicate the promise of the application of neural network approaches to the B. subtilis promoter recognition problem, while also suggesting the broader potential of algorithms for automation of data analyses in the postgenomic era.

Assuntos

Automação/métodos , Bacillus subtilis/genética , Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA/métodos , Algoritmos , Escherichia coli/genética , Expressão Gênica/genética , Genes Bacterianos/genética , Genoma Bacteriano/genética , Redes Neurais de Computação

9.

Analysis of the nucleotide content of Escherichia coli promoter sequences related to the alternative sigma factors.

Dall'Alba, Gabriel; Casa, Pedro Lenz; Notari, Daniel Luis; Adami, Andre Gustavo; Echeverrigaray, Sergio; de Avila E Silva, Scheila.

J Mol Recognit ; 32(5): e2770, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30458580

RESUMO

Promoters are DNA sequences located upstream of the transcription start site of genes. In bacteria, the RNA polymerase enzyme requires additional subunits, called sigma factors (σ) to begin specific gene transcription in distinct environmental conditions. Currently, promoter prediction still poses many challenges due to the characteristics of these sequences. In this paper, the nucleotide content of Escherichia coli promoter sequences, related to five alternative σ factors, was analyzed by a machine learning technique in order to provide profiles according to the σ factor which recognizes them. For this, the clustering technique was applied since it is a viable method for finding hidden patterns on a data set. As a result, 20 groups of sequences were formed, and, aided by the Weblogo tool, it was possible to determine sequence profiles. These found patterns should be considered for implementing computational prediction tools. In addition, evidence was found of an overlap between the functions of the genes regulated by different σ factors, suggesting that DNA structural properties are also essential parameters for further studies.

Assuntos

Escherichia coli/enzimologia , Escherichia coli/genética , Regiões Promotoras Genéticas , Fator sigma/genética , Algoritmos , Sequência de Bases , RNA Polimerases Dirigidas por DNA/genética , RNA Polimerases Dirigidas por DNA/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Nucleotídeos/análise , Fator sigma/metabolismo , Transcrição Gênica

10.

Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria.

Coelho, Rafael Vieira; de Avila E Silva, Scheila; Echeverrigaray, Sergio; Delamare, Ana Paula Longaray.

Data Brief ; 19: 264-270, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29892645

RESUMO

This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.

11.

Triplet entropy analysis of hemagglutinin and neuraminidase sequences measures influenza virus phylodynamics.

Gerhardt, Günther J L; Takeda, Agnes A S; Andrighetti, Tahila; Sartor, Ivaine T S; Echeverrigaray, Sergio L; de Avila E Silva, Scheila; Dos Santos, Laurita; Rybarczyk-Filho, José L.

Gene ; 528(2): 277-81, 2013 Oct 10.

Artigo em Inglês | MEDLINE | ID: mdl-23850726

RESUMO

The influenza virus has been a challenge to science due to its ability to withstand new environmental conditions. Taking into account the development of virus sequence databases, computational approaches can be helpful to understand virus behavior over time. Furthermore, they can suggest new directions to deal with influenza. This work presents triplet entropy analysis as a potential phylodynamic tool to quantify nucleotide organization of viral sequences. The application of this measure to segments of hemagglutinin (HA) and neuraminidase (NA) of H1N1 and H3N2 virus subtypes has shown some variability effects along timeline, inferring about virus evolution. Sequences were divided by year and compared for virus subtype (H1N1 and H3N2). The nonparametric Mann-Whitney test was used for comparison between groups. Results show that differentiation in entropy precedes differentiation in GC content for both groups. Considering the HA fragment, both triplet entropy as well as GC concentration show intersection in 2009, year of the recent pandemic. Some conclusions about possible flu evolutionary lines were drawn.

Assuntos

Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H3N2/genética , Neuraminidase/genética , Composição de Bases , Evolução Molecular , Humanos , Modelos Genéticos , Filogenia , Análise de Sequência de DNA , Estatísticas não Paramétricas , Termodinâmica

12.

BacPP: bacterial promoter prediction--a tool for accurate sigma-factor specific assignment in enterobacteria.

de Avila E Silva, Scheila; Echeverrigaray, Sergio; Gerhardt, Günther J L.

J Theor Biol ; 287: 92-9, 2011 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-21827769

RESUMO

Promoter sequences are well known to play a central role in gene expression. Their recognition and assignment in silico has not consolidated into a general bioinformatics method yet. Most previously available algorithms employ and are limited to σ70-dependent promoter sequences. This paper presents a new tool named BacPP, designed to recognize and predict Escherichia coli promoter sequences from background with specific accuracy for each σ factor (respectively, σ24, 86.9%; σ28, 92.8%; σ32, 91.5%; σ38, 89.3%, σ54, 97.0%; and σ70, 83.6%). BacPP is hence outstanding in recognition and assignment of sequences according to σ factor and provide circumstantial information about upstream gene sequences. This bioinformatic tool was developed by weighing rules extracted from neural networks trained with promoter sequences known to respond to a specific σ factor. Furthermore, when challenged with promoter sequences belonging to other enterobacteria BacPP maintained 76% accuracy overall.

Assuntos

Biologia Computacional/métodos , Enterobacteriaceae/genética , Regiões Promotoras Genéticas/genética , Fator sigma/genética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica/genética , Redes Neurais de Computação

13.

Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters.

de Avila E Silva, Scheila; Gerhardt, Günther J L; Echeverrigaray, Sergio.

Genet Mol Biol ; 34(2): 353-60, 2011 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21734842

RESUMO

Promoters are DNA sequences located upstream of the gene region and play a central role in gene expression. Computational techniques show good accuracy in gene prediction but are less successful in predicting promoters, primarily because of the high number of false positives that reflect characteristics of the promoter sequences. Many machine learning methods have been used to address this issue. Neural Networks (NN) have been successfully used in this field because of their ability to recognize imprecise and incomplete patterns characteristic of promoter sequences. In this paper, NN was used to predict and recognize promoter sequences in two data sets: (i) one based on nucleotide sequence information and (ii) another based on stability sequence information. The accuracy was approximately 80% for simulation (i) and 68% for simulation (ii). In the rules extracted, biological consensus motifs were important parts of the NN learning process in both simulations.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA