Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 142
Filtrar
1.
Methods Mol Biol ; 2834: 115-130, 2025.
Artigo em Inglês | MEDLINE | ID: mdl-39312162

RESUMO

The recent advancements in machine learning and the new availability of large chemical datasets made the development of tools and protocols for computational chemistry a topic of high interest. In this chapter a standard procedure to develop Quantitative Structure-Activity Relationship (QSAR) models was presented and implemented in two freely available and easy-to-use workflows. The first workflow helps the user retrieving chemical data (SMILES) from the web, checking their correctness and curating them to produce consistent and ready-to-use datasets for cheminformatic. The second workflow implements six machine learning methods to develop classification QSAR models. Models can be additionally used to predict external chemicals. Calculation and selection of chemical descriptors, tuning of models' hyperparameters, and methods to handle data unbalancing are also incorporated in the workflow. Both the workflows are implemented in KNIME and represent a useful tool for computational scientists, as well as an intuitive and straightforward introduction to QSAR.


Assuntos
Curadoria de Dados , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Fluxo de Trabalho , Curadoria de Dados/métodos , Software , Quimioinformática/métodos , Biologia Computacional/métodos
2.
Methods Mol Biol ; 2834: 393-441, 2025.
Artigo em Inglês | MEDLINE | ID: mdl-39312176

RESUMO

The Asclepios suite of KNIME nodes represents an innovative solution for conducting cheminformatics and computational chemistry tasks, specifically tailored for applications in drug discovery and computational toxicology. This suite has been developed using open-source and publicly accessible software. In this chapter, we introduce and explore the Asclepios suite through the lens of a case study. This case study revolves around investigating the interactions between per- and polyfluorinated alkyl substances (PFAS) and biomolecules, such as nuclear receptors. The objective is to characterize the potential toxicity of PFAS and gain insights into their chemical mode of action at the molecular level. The Asclepios KNIME nodes have been designed as versatile tools capable of addressing a wide range of computational toxicology challenges. Furthermore, they can be adapted and customized to accomodate the specific needs of individual users, spanning various domains such as nanoinformatics, biomedical research, and other related applications. This chapter provides an in-depth examination of the technical underpinnings and foundations of these tools. It is accompanied by a practical case study that demonstrates the utilization of Asclepios nodes in a computational toxicology investigation. This showcases the extendable functionalities that can be applied in diverse computational chemistry contexts. By the end of this chapter, we aim for readers to have a comprehensive understanding of the effectiveness of the Asclepios node functions. These functions hold significant potential for enhancing a wide spectrum of cheminformatics applications.


Assuntos
Descoberta de Drogas , Software , Fluxo de Trabalho , Descoberta de Drogas/métodos , Humanos , Toxicologia/métodos , Quimioinformática/métodos , Biologia Computacional/métodos , Fluorocarbonos/química , Fluorocarbonos/toxicidade
3.
Sci Rep ; 14(1): 20812, 2024 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-39242880

RESUMO

With the exponential progress in the field of cheminformatics, the conventional modeling approaches have so far been to employ supervised and unsupervised machine learning (ML) and deep learning models, utilizing the standard molecular descriptors, which represent the structural, physicochemical, and electronic properties of a particular compound. Deviating from the conventional approach, in this investigation, we have employed the classification Read-Across Structure-Activity Relationship (c-RASAR), which involves the amalgamation of the concepts of classification-based quantitative structure-activity relationship (QSAR) and Read-Across to incorporate Read-Across-derived similarity and error-based descriptors into a statistical and machine learning modeling framework. ML models developed from these RASAR descriptors use similarity-based information from the close source neighbors of a particular query compound. We have employed different classification modeling algorithms on the selected QSAR and RASAR descriptors to develop predictive models for efficient prediction of query compounds' hepatotoxicity. The predictivity of each of these models was evaluated on a large number of test set compounds. The best-performing model was also used to screen a true external data set. The concepts of explainable AI (XAI) coupled with Read-Across were used to interpret the contributions of the RASAR descriptors in the best c-RASAR model and to explain the chemical diversity in the dataset. The application of various unsupervised dimensionality reduction techniques like t-SNE and UMAP and the supervised ARKA framework showed the usefulness of the RASAR descriptors over the selected QSAR descriptors in their ability to group similar compounds, enhancing the modelability of the dataset and efficiently identifying activity cliffs. Furthermore, the activity cliffs were also identified from Read-Across by observing the nature of compounds constituting the nearest neighbors for a particular query compound. On comparing our simple linear c-RASAR model with the previously reported models developed using the same dataset derived from the US FDA Orange Book ( https://www.accessdata.fda.gov/scripts/cder/ob/index.cfm ), it was observed that our model is simple, reproducible, transferable, and highly predictive. The performance of the LDA c-RASAR model on the true external set supersedes that of the previously reported work. Therefore, the present simple LDA c-RASAR model can efficiently be used to predict the hepatotoxicity of query chemicals.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas , Relação Quantitativa Estrutura-Atividade , Doença Hepática Induzida por Substâncias e Drogas/etiologia , Algoritmos , Aprendizado de Máquina , Humanos , Quimioinformática/métodos
4.
J Vis Exp ; (211)2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39311615

RESUMO

Chemical space is a multidimensional descriptor space that encloses all possible molecules, and at least 1 x 1060 organic substances with a molecular weight below 500 Da are thought to be potentially relevant for drug discovery. Natural products have been the primary source of the new pharmacological entities marketed during the past forty years and continue to be one of the most productive sources for the creation of innovative medications. Chemoinformatics-based computational tools accelerate the drug development process for natural products. Methods including estimating bioactivities, safety profiles, ADME, and natural product likeness measurement have been used. Here, we go over recent developments in chemoinformatic tools designed to visualize, characterize, and expand the chemical space of natural compound data sets using various molecular representations, create visual representations of such spaces, and investigate structure-property relationships within chemical spaces. With an emphasis on drug discovery applications, we evaluate the open-source databases BIOFACQUIM and PeruNPDB as proof of concept.


Assuntos
Produtos Biológicos , Descoberta de Drogas , Produtos Biológicos/química , Descoberta de Drogas/métodos , Quimioinformática/métodos , Bases de Dados de Compostos Químicos
5.
Crit Rev Toxicol ; 54(9): 659-684, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39225123

RESUMO

This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.


Assuntos
Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Humanos , Quimioinformática/métodos , Relação Estrutura-Atividade , Animais
6.
J Nat Prod ; 87(9): 2216-2229, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39269718

RESUMO

Natural products (NPs) are secondary metabolites of natural origin with broad applications across various human activities, particularly the discovery of bioactive compounds. Structural elucidation of new NPs entails significant cost and effort. On the other hand, the dereplication of known compounds is crucial for the early exclusion of irrelevant compounds in contemporary pharmaceutical research. NAPROC-13 stands out as a publicly accessible database, providing structural and 13C NMR spectroscopic information for over 25 000 compounds, rendering it a pivotal resource in natural product (NP) research, favoring open science. This study seeks to quantitatively analyze the chemical content, structural diversity, and chemical space coverage of NPs within NAPROC-13, compared to FDA-approved drugs and a very diverse subset of NPs, UNPD-A. Findings indicated that NPs in NAPROC-13 exhibit properties comparable to those in UNPD-A, albeit showcasing a notably diverse array of structural content, scaffolds, ring systems of pharmaceutical interest, and molecular fragments. NAPROC-13 covers a specific region of the chemical multiverse (a generalization of the chemical space from different chemical representations) regarding physicochemical properties and a region as broad as UNPD-A in terms of the structural features represented by fingerprints.


Assuntos
Produtos Biológicos , Produtos Biológicos/química , Estrutura Molecular , Quimioinformática/métodos , Espectroscopia de Ressonância Magnética Nuclear de Carbono-13
7.
Biomolecules ; 14(8)2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39199420

RESUMO

The development of new treatments for neglected tropical diseases (NTDs) remains a major challenge in the 21st century. In most cases, the available drugs are obsolete and have limitations in terms of efficacy and safety. The situation becomes even more complex when considering the low number of new chemical entities (NCEs) currently in use in advanced clinical trials for most of these diseases. Natural products (NPs) are valuable sources of hits and lead compounds with privileged scaffolds for the discovery of new bioactive molecules. Considering the relevance of biodiversity for drug discovery, a chemoinformatics analysis was conducted on a compound dataset of NPs with anti-trypanosomatid activity reported in 497 research articles from 2019 to 2024. Structures corresponding to different metabolic classes were identified, including terpenoids, benzoic acids, benzenoids, steroids, alkaloids, phenylpropanoids, peptides, flavonoids, polyketides, lignans, cytochalasins, and naphthoquinones. This unique collection of NPs occupies regions of the chemical space with drug-like properties that are relevant to anti-trypanosomatid drug discovery. The gathered information greatly enhanced our understanding of biologically relevant chemical classes, structural features, and physicochemical properties. These results can be useful in guiding future medicinal chemistry efforts for the development of NP-inspired NCEs to treat NTDs caused by trypanosomatid parasites.


Assuntos
Biodiversidade , Produtos Biológicos , Quimioinformática , Descoberta de Drogas , Doenças Negligenciadas , Animais , Humanos , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Produtos Biológicos/uso terapêutico , Quimioinformática/métodos , Descoberta de Drogas/métodos , Doenças Negligenciadas/tratamento farmacológico , Tripanossomicidas/química , Tripanossomicidas/farmacologia , Tripanossomicidas/uso terapêutico , Trypanosoma/efeitos dos fármacos
8.
Microb Pathog ; 195: 106892, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39216611

RESUMO

The highly pathogenic Marburg virus (MARV) is a member of the Filoviridae family, a non-segmented negative-strand RNA virus. This article represents the computer-aided drug design (CADD) approach for identifying drug-like compounds that prevent the MARV virus disease by inhibiting nucleoprotein, which is responsible for their replication. This study used a wide range of in silico drug design techniques to identify potential drugs. Out of 368 natural compounds, 202 compounds passed ADMET, and molecular docking identified the top two molecules (CID: 1804018 and 5280520) with a high binding affinity of -6.77 and -6.672 kcal/mol, respectively. Both compounds showed interactions with the common amino acid residues SER_216, ARG_215, TYR_135, CYS_195, and ILE_108, which indicates that lead compounds and control ligands interact in the common active site/catalytic site of the protein. The negative binding free energies of CID: 1804018 and 5280520 were -66.01 and -31.29 kcal/mol, respectively. Two lead compounds were re-evaluated using MD modeling techniques, which confirmed CID: 1804018 as the most stable when complexed with the target protein. PC3 of the (Z)-2-(2,5-dimethoxybenzylidene)-6-(2-(4-methoxyphenyl)-2-oxoethoxy) benzofuran-3(2H)-one (CID: 1804018) was 8.74 %, whereas PC3 of the 2'-Hydroxydaidzein (CID: 5280520) was 11.25 %. In this study, (Z)-2-(2,5-dimethoxybenzylidene)-6-(2-(4-methoxyphenyl)-2-oxoethoxy) benzofuran-3(2H)-one (CID: 1804018) unveiled the significant stability of the proteins' binding site in ADMET, Molecular docking, MM-GBSA and MD simulation analysis studies, which also showed a high negative binding free energy value, confirming as the best drug candidate which is found in Angelica archangelica which may potentially inhibit the replication of MARV nucleoprotein.


Assuntos
Antivirais , Benzofuranos , Marburgvirus , Simulação de Acoplamento Molecular , Replicação Viral , Antivirais/farmacologia , Antivirais/química , Antivirais/metabolismo , Marburgvirus/efeitos dos fármacos , Marburgvirus/metabolismo , Benzofuranos/farmacologia , Benzofuranos/química , Benzofuranos/metabolismo , Replicação Viral/efeitos dos fármacos , Quimioinformática/métodos , Desenho de Fármacos , Ligação Proteica , Proteínas de Ligação a RNA/metabolismo , Proteínas de Ligação a RNA/química , Sítios de Ligação , Ligantes
9.
Comput Biol Med ; 180: 108954, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39094327

RESUMO

Indoleamine 2,3-dioxygenase (IDO) and tryptophan 2,3-dioxygenase (TDO) are attractive drug targets for cancer immunotherapy. After disappointing results of the epacadostat as a selective IDO inhibitor in phase III clinical trials, there is much interest in the development of the TDO selective inhibitors. In the current study, several data analysis methods and machine learning approaches including logistic regression, Random Forest, XGBoost and Support Vector Machines were used to model a data set of compounds retrieved from ChEMBL. Models based on the Morgan fingerprints revealed notable fragments for the selective inhibition of the IDO, TDO or both. Multiple fragment docking was performed to find the best set of bound fragments and their orientation in the space for efficient linking. Linking the fragments and optimization of the final molecules were accomplished by means of an artificial intelligence generative framework. Finally, selectivity of the optimized molecules was assessed and the top 4 lead molecules were filtered through PAINS, Brenk and NIH filters. Results indicated that phenyloxalamide, fluoroquinoline, and 3-bromo-4-fluroaniline confer selectivity towards the IDO inhibition. Correspondingly, 1-benzyl-1H-naphtho[2,3-d][1,2,3]triazole-4,9-dione was found to be an integral fragment for the selective inhibition of the TDO by constituting a coordination bond with the Fe atom of heme. In addition, furo[2,3-c]pyridine-2,3-diamine was found as a common fragment for inhibition of the both targets and can be used in the design of the dual target inhibitors of the IDO and TDO. The new fragments introduced here can be a useful building blocks for incorporation into the selective TDO or dual IDO/TDO inhibitors.


Assuntos
Quimioinformática , Inibidores Enzimáticos , Indolamina-Pirrol 2,3,-Dioxigenase , Aprendizado de Máquina , Triptofano Oxigenase , Indolamina-Pirrol 2,3,-Dioxigenase/antagonistas & inibidores , Indolamina-Pirrol 2,3,-Dioxigenase/química , Indolamina-Pirrol 2,3,-Dioxigenase/metabolismo , Triptofano Oxigenase/antagonistas & inibidores , Triptofano Oxigenase/metabolismo , Triptofano Oxigenase/química , Humanos , Quimioinformática/métodos , Inibidores Enzimáticos/química , Simulação de Acoplamento Molecular
10.
Molecules ; 29(15)2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39125052

RESUMO

Marine natural products (MNPs) continue to be tested primarily in cellular toxicity assays, both mammalian and microbial, despite most being inactive at concentrations relevant to drug discovery. These MNPs become missed opportunities and represent a wasteful use of precious bioresources. The use of cheminformatics aligned with published bioactivity data can provide insights to direct the choice of bioassays for the evaluation of new MNPs. Cheminformatics analysis of MNPs found in MarinLit (n = 39,730) up to the end of 2023 highlighted indol-3-yl-glyoxylamides (IGAs, n = 24) as a group of MNPs with no reported bioactivities. However, a recent review of synthetic IGAs highlighted these scaffolds as privileged structures with several compounds under clinical evaluation. Herein, we report the synthesis of a library of 32 MNP-inspired brominated IGAs (25-56) using a simple one-pot, multistep method affording access to these diverse chemical scaffolds. Directed by a meta-analysis of the biological activities reported for marine indole alkaloids (MIAs) and synthetic IGAs, the brominated IGAs 25-56 were examined for their potential bioactivities against the Parkinson's Disease amyloid protein alpha synuclein (α-syn), antiplasmodial activities against chloroquine-resistant (3D7) and sensitive (Dd2) parasite strains of Plasmodium falciparum, and inhibition of mammalian (chymotrypsin and elastase) and viral (SARS-CoV-2 3CLpro) proteases. All of the synthetic IGAs tested exhibited binding affinity to the amyloid protein α-syn, while some showed inhibitory activities against P. falciparum, and the proteases, SARS-CoV-2 3CLpro, and chymotrypsin. The cellular safety of the IGAs was examined against cancerous and non-cancerous human cell lines, with all of the compounds tested inactive, thereby validating cheminformatics and meta-analyses results. The findings presented herein expand our knowledge of marine IGA bioactive chemical space and advocate expanding the scope of biological assays routinely used to investigate NP bioactivities, specifically those more suitable for non-toxic compounds. By integrating cheminformatics tools and functional assays into NP biological testing workflows, we can aim to enhance the potential of NPs and their scaffolds for future drug discovery and development.


Assuntos
Produtos Biológicos , Quimioinformática , Descoberta de Drogas , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Humanos , Quimioinformática/métodos , SARS-CoV-2/efeitos dos fármacos , Organismos Aquáticos/química , Indóis/química , Indóis/farmacologia , Plasmodium falciparum/efeitos dos fármacos , Alcaloides Indólicos/farmacologia , Alcaloides Indólicos/química , Animais
11.
Mol Inform ; 43(8): e202400050, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38979846

RESUMO

The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.


Assuntos
Proteínas de Fusão bcr-abl , Inibidores de Proteínas Quinases , Proteínas de Fusão bcr-abl/antagonistas & inibidores , Proteínas de Fusão bcr-abl/química , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Relação Estrutura-Atividade , Humanos , Quimioinformática/métodos , Farmacóforo
12.
J Chem Inf Model ; 64(15): 5888-5899, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39009039

RESUMO

Chemical information disseminated in scientific documents offers an untapped potential for deep learning-assisted insights and breakthroughs. Automated extraction efforts have shifted from resource-intensive manual extraction toward applying machine learning methods to streamline chemical data extraction. While current extraction models and pipelines have ushered in notable efficiency improvements, they often exhibit modest performance, compromising the accuracy of predictive models trained on extracted data. Further, current chemical pipelines lack both transferability─where a model trained on one task can be adapted to another relevant task with limited examples─and extensibility, which enables seamless adaptability for new extraction tasks. Addressing these gaps, we present ChemREL, a versatile chemical data extraction pipeline emphasizing performance, transferability, and extensibility. ChemREL utilizes a custom, diverse data set of chemical documents, labeled through an active learning strategy to extract two properties: normal melting point and lethal dose 50 (LD50). The normal melting point is selected for its prevalence in diverse contexts and wider literature, serving as the foundation for pipeline training. In contrast, LD50 evaluates the pipeline's transferability to an unrelated property, underscoring variance in its biological nature, toxicological context, and units, among other differences. With pretraining and fine-tuning, our pipeline outperforms existing methods and GPT-4, achieving F1-scores of 96.1% for entity identification and 97.0% for relation mapping, culminating in an overall F1-score of 95.4%. More importantly, ChemREL displays high transferability, effectively transitioning from melting point extraction to LD50 extraction with 10 randomly selected training documents. Released as an open-source package, ChemREL aims to broaden access to chemical data extraction, enabling the construction of expansive relational data sets that propel discovery.


Assuntos
Aprendizado Profundo , Mineração de Dados/métodos , Quimioinformática/métodos
14.
Expert Opin Drug Discov ; 19(9): 1043-1069, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39004919

RESUMO

INTRODUCTION: Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED: This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION: Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.


Assuntos
Algoritmos , Biologia Computacional , Descoberta de Drogas , Polifarmacologia , Humanos , Descoberta de Drogas/métodos , Biologia Computacional/métodos , Quimioinformática/métodos , Animais
15.
J Chem Inf Model ; 64(14): 5451-5469, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-38949069

RESUMO

This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.


Assuntos
Aprendizado de Máquina , Estereoisomerismo , Quimioinformática/métodos , Ciclodextrinas/química , Processamento de Linguagem Natural
16.
J Chem Inf Model ; 64(14): 5521-5534, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-38950894

RESUMO

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of OpenChemIE attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. OpenChemIE is most suited for information extraction on organic chemistry literature, where molecules are generally depicted as planar graphs or written in text and can be consolidated into a SMILES format. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.


Assuntos
Aprendizado de Máquina , Mineração de Dados/métodos , Bases de Dados de Compostos Químicos , Algoritmos , Quimioinformática/métodos
17.
J Chem Inf Model ; 64(14): 5570-5579, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-38958581

RESUMO

One of the most challenging tasks in modern medicine is to find novel efficient cancer therapeutic methods with minimal side effects. The recent discovery of several classes of organic molecules known as "molecular jackhammers" is a promising development in this direction. It is known that these molecules can directly target and eliminate cancer cells with no impact on healthy tissues. However, the underlying microscopic picture remains poorly understood. We present a study that utilizes theoretical analysis together with experimental measurements to clarify the microscopic aspects of jackhammers' anticancer activities. Our physical-chemical approach combines statistical analysis with chemoinformatics methods to design and optimize molecular jackhammers. By correlating specific physical-chemical properties of these molecules with their abilities to kill cancer cells, several important structural features are identified and discussed. Although our theoretical analysis enhances understanding of the molecular interactions of jackhammers, it also highlights the need for further research to comprehensively elucidate their mechanisms and to develop a robust physical-chemical framework for the rational design of targeted anticancer drugs.


Assuntos
Antineoplásicos , Quimioinformática , Humanos , Antineoplásicos/farmacologia , Antineoplásicos/química , Quimioinformática/métodos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Linhagem Celular Tumoral , Modelos Moleculares
18.
PLoS One ; 19(7): e0306202, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38968199

RESUMO

Chemical information has become increasingly ubiquitous and has outstripped the pace of analysis and interpretation. We have developed an R package, uafR, that automates a grueling retrieval process for gas -chromatography coupled mass spectrometry (GC -MS) data and allows anyone interested in chemical comparisons to quickly perform advanced structural similarity matches. Our streamlined cheminformatics workflows allow anyone with basic experience in R to pull out component areas for tentative compound identifications using the best published understanding of molecules across samples (pubchem.gov). Interpretations can now be done at a fraction of the time, cost, and effort it would typically take using a standard chemical ecology data analysis pipeline. The package was tested in two experimental contexts: (1) A dataset of purified internal standards, which showed our algorithms correctly identified the known compounds with R2 values ranging from 0.827-0.999 along concentrations ranging from 1 × 10-5 to 1 × 103 ng/µl, (2) A large, previously published dataset, where the number and types of compounds identified were comparable (or identical) to those identified with the traditional manual peak annotation process, and NMDS analysis of the compounds produced the same pattern of significance as in the original study. Both the speed and accuracy of GC -MS data processing are drastically improved with uafR because it allows users to fluidly interact with their experiment following tentative library identifications [i.e. after the m/z spectra have been matched against an installed chemical fragmentation database (e.g. NIST)]. Use of uafR will allow larger datasets to be collected and systematically interpreted quickly. Furthermore, the functions of uafR could allow backlogs of previously collected and annotated data to be processed by new personnel or students as they are being trained. This is critical as we enter the era of exposomics, metabolomics, volatilomes, and landscape level, high-throughput chemotyping. This package was developed to advance collective understanding of chemical data and is applicable to any research that benefits from GC -MS analysis. It can be downloaded for free along with sample datasets from Github at github.org/castratton/uafR or installed directly from R or RStudio using the developer tools: 'devtools::install_github("castratton/uafR")'.


Assuntos
Algoritmos , Cromatografia Gasosa-Espectrometria de Massas , Software , Cromatografia Gasosa-Espectrometria de Massas/métodos , Quimioinformática/métodos
19.
PLoS One ; 19(6): e0302105, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38889115

RESUMO

The present study was focused on exploring the efficient inhibitors of closed state (form) of type III effector Xanthomonas outer protein Q (XopQ) (PDB: 4P5F) from the 44 phytochemicals of Picrasma quassioides using cutting-edge computational analysis. Among them, Kumudine B showed excellent binding energy (-11.0 kcal/mol), followed by Picrasamide A, Quassidine I and Quassidine J with the targeted closed state of XopQ protein compared to the reference standard drug (Streptomycin). The molecular dynamics (MD) simulations performed at 300 ns validated the stability of top lead ligands (Kumudine B, Picrasamide A, and Quassidine I)-bound XopQ protein complex with slightly lower fluctuation than Streptomycin. The MM-PBSA calculation confirmed the strong interactions of top lead ligands (Kumudine B and QuassidineI) with XopQ protein, as they offered the least binding energy. The results of absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis confirmed that Quassidine I, Kumudine B and Picrasamide A were found to qualify most of the drug-likeness rules with excellent bioavailability scores compared to Streptomycin. Results of the computational studies suggested that Kumudine B, Picrasamide A, and Quassidine I could be considered potential compounds to design novel antibacterial drugs against X. oryzae infection. Further in vitro and in vivo antibacterial activities of Kumudine B, Picrasamide A, and Quassidine I are required to confirm their therapeutic potentiality in controlling the X. oryzae infection.


Assuntos
Antibacterianos , Simulação de Dinâmica Molecular , Xanthomonas , Antibacterianos/farmacologia , Antibacterianos/química , Xanthomonas/efeitos dos fármacos , Quimioinformática/métodos , Simulação de Acoplamento Molecular , Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/química
20.
BMC Bioinformatics ; 25(1): 225, 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38926641

RESUMO

PURPOSE: Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. METHOD: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. RESULTS: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. CONCLUSION: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT .


Assuntos
Quimioinformática , Quimioinformática/métodos , Interações Medicamentosas , Estrutura Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA