Búsqueda | Portal Regional de la BVS

1.

Chemprop: A Machine Learning Package for Chemical Property Prediction.

Heid, Esther; Greenman, Kevin P; Chung, Yunsie; Li, Shih-Cheng; Graff, David E; Vermeire, Florence H; Wu, Haoyang; Green, William H; McGill, Charles J.

J Chem Inf Model ; 64(1): 9-17, 2024 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-38147829

RESUMEN

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.

Asunto(s)

Aprendizaje Automático , Programas Informáticos , Redes Neurales de la Computación , Fenómenos Químicos , Agua

2.

Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back.

Koscher, Brent A; Canty, Richard B; McDonald, Matthew A; Greenman, Kevin P; McGill, Charles J; Bilodeau, Camille L; Jin, Wengong; Wu, Haoyang; Vermeire, Florence H; Jin, Brooke; Hart, Travis; Kulesza, Timothy; Li, Shih-Cheng; Jaakkola, Tommi S; Barzilay, Regina; Gómez-Bombarelli, Rafael; Green, William H; Jensen, Klavs F.

Science ; 382(6677): eadi1407, 2023 Dec 22.

Artículo en Inglés | MEDLINE | ID: mdl-38127734

RESUMEN

A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.

3.

ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents.

Pattanaik, Lagnajit; Menon, Angiras; Settels, Volker; Spiekermann, Kevin A; Tan, Zipei; Vermeire, Florence H; Sandfort, Frederik; Eiden, Philipp; Green, William H.

J Phys Chem B ; 127(47): 10151-10170, 2023 Nov 30.

Artículo en Inglés | MEDLINE | ID: mdl-37966798

RESUMEN

Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.

4.

Characterizing Uncertainty in Machine Learning for Chemistry.

Heid, Esther; McGill, Charles J; Vermeire, Florence H; Green, William H.

J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.

Artículo en Inglés | MEDLINE | ID: mdl-37338239

RESUMEN

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.

Asunto(s)

Aprendizaje Automático , Incertidumbre , Reproducibilidad de los Resultados

5.

Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures.

Vermeire, Florence H; Chung, Yunsie; Green, William H.

J Am Chem Soc ; 144(24): 10785-10797, 2022 06 22.

Artículo en Inglés | MEDLINE | ID: mdl-35687887

RESUMEN

The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.

Asunto(s)

Agua , Recolección de Datos , Solubilidad , Soluciones , Solventes/química , Temperatura , Termodinámica , Agua/química

6.

Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy.

Chung, Yunsie; Vermeire, Florence H; Wu, Haoyang; Walker, Pierre J; Abraham, Michael H; Green, William H.

J Chem Inf Model ; 62(3): 433-446, 2022 02 14.

Artículo en Inglés | MEDLINE | ID: mdl-35044781

RESUMEN

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.

Asunto(s)

Aprendizaje Automático , Redes Neurales de la Computación , Entropía , Soluciones , Solventes , Termodinámica

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA