Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
1.
J Chem Phys ; 160(5)2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38341696

RESUMEN

We study alchemical atomic energy partitioning as a method to estimate atomization energies from atomic contributions, which are defined in physically rigorous and general ways through the use of the uniform electron gas as a joint reference. We analyze quantitatively the relation between atomic energies and their local environment using a dataset of 1325 organic molecules. The atomic energies are transferable across various molecules, enabling the prediction of atomization energies with a mean absolute error of 23 kcal/mol, comparable to simple statistical estimates but potentially more robust given their grounding in the physics-based decomposition scheme. A comparative analysis with other decomposition methods highlights its sensitivity to electrostatic variations, underlining its potential as a representation of the environment as well as in studying processes like diffusion in solids characterized by significant electrostatic shifts.

2.
Digit Discov ; 3(1): 23-33, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38239898

RESUMEN

In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.

3.
Phys Chem Chem Phys ; 26(5): 4306-4319, 2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38234256

RESUMEN

The efficiency of machine learning algorithms for electronically excited states is far behind ground-state applications. One of the underlying problems is the insufficient smoothness of the fitted potential energy surfaces and other properties in the vicinity of state crossings and conical intersections, which is a prerequisite for an efficient regression. Smooth surfaces can be obtained by switching to the diabatic basis. However, diabatization itself is still an outstanding problem. We overcome these limitations by solving both problems at once. We use a machine learning approach combining clustering and regression techniques to correct for the deficiencies of property-based diabatization which, in return, provides us with smooth surfaces that can be easily fitted. Our approach extends the applicability of property-based diabatization to multidimensional systems. We utilize the proposed diabatization scheme to achieve higher prediction accuracy for adiabatic states and we show its performance by reconstructing global potential energy surfaces of excited states of nitrosyl fluoride and formaldehyde. While the proposed methodology is independent of the specific property-based diabatization and regression algorithm, we show its performance for kernel ridge regression and a very simple diabatization based on transition multipoles. Compared to most other algorithms based on machine learning, our approach needs only a small amount of training data.

4.
J Chem Theory Comput ; 19(23): 8861-8870, 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38009856

RESUMEN

Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.

5.
J Chem Phys ; 159(3)2023 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-37462285

RESUMEN

The feature vector mapping used to represent chemical systems is a key factor governing the superior data efficiency of kernel based quantum machine learning (QML) models applicable throughout chemical compound space. Unfortunately, the most accurate representations require a high dimensional feature mapping, thereby imposing a considerable computational burden on model training and use. We introduce compact yet accurate, linear scaling QML representations based on atomic Gaussian many-body distribution functionals (MBDF) and their derivatives. Weighted density functions of MBDF values are used as global representations that are constant in size, i.e., invariant with respect to the number of atoms. We report predictive performance and training data efficiency that is competitive with state-of-the-art for two diverse datasets of organic molecules, QM9 and QMugs. Generalization capability has been investigated for atomization energies, highest occupied molecular orbital-lowest unoccupied molecular orbital eigenvalues and gap, internal energies at 0 K, zero point vibrational energies, dipole moment norm, static isotropic polarizability, and heat capacity as encoded in QM9. MBDF based QM9 performance lowers the optimal Pareto front spanned between sampling and training cost to compute node minutes, effectively sampling chemical compound space with chemical accuracy at a sampling rate of ∼48 molecules per core second.

6.
Science ; 381(6654): 170-175, 2023 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-37440654

RESUMEN

Density functional theory (DFT) plays a pivotal role in chemical and materials science because of its relatively high predictive power, applicability, versatility, and computational efficiency. We review recent progress in machine learning (ML) model developments, which have relied heavily on DFT for synthetic data generation and for the design of model architectures. The general relevance of these developments is placed in a broader context for chemical and materials sciences. DFT-based ML models have reached high efficiency, accuracy, scalability, and transferability and pave the way to the routine use of successful experimental planning software within self-driving laboratories.

7.
Phys Chem Chem Phys ; 25(20): 13933-13945, 2023 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-37190820

RESUMEN

Recent advances in experimental methodology enabled studies of the quantum-state- and conformational dependence of chemical reactions under precisely controlled conditions in the gas phase. Here, we generated samples of selected gauche and s-trans 2,3-dibromobutadiene (DBB) by electrostatic deflection in a molecular beam and studied their reaction with Coulomb crystals of laser-cooled Ca+ ions in an ion trap. The rate coefficients for the total reaction were found to strongly depend on both the conformation of DBB and the electronic state of Ca+. In the (4p)2P1/2 and (3d)2D3/2 excited states of Ca+, the reaction is capture-limited and faster for the gauche conformer due to long-range ion-dipole interactions. In the (4s)2S1/2 ground state of Ca+, the reaction rate for s-trans DBB still conforms with the capture limit, while that for gauche DBB is strongly suppressed. The experimental observations were analysed with the help of adiabatic capture theory, ab initio calculations and reactive molecular dynamics simulations on a machine-learned full-dimensional potential energy surface of the system. The theory yields near-quantitative agreement for s-trans-DBB, but overestimates the reactivity of the gauche-conformer compared to the experiment. The present study points to the important role of molecular geometry even in strongly reactive exothermic systems and illustrates striking differences in the reactivity of individual conformers in gas-phase ion-molecule reactions.

8.
J Chem Theory Comput ; 19(6): 1711-1721, 2023 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-36857531

RESUMEN

In the past decade, quantum diffusion Monte Carlo (DMC) has been demonstrated to successfully predict the energetics and properties of a wide range of molecules and solids by numerically solving the electronic many-body Schrödinger equation. With O(N3) scaling with the number of electrons N, DMC has the potential to be a reference method for larger systems that are not accessible to more traditional methods such as CCSD(T). Assessing the accuracy of DMC for smaller molecules becomes the stepping stone in making the method a reference for larger systems. We show that when coupled with quantum machine learning (QML)-based surrogate methods, the computational burden can be alleviated such that quantum Monte Carlo (QMC) shows clear potential to undergird the formation of high-quality descriptions across chemical space. We discuss three crucial approximations necessary to accomplish this: the fixed-node approximation, universal and accurate references for chemical bond dissociation energies, and scalable minimal amons-set-based QML (AQML) models. Numerical evidence presented includes converged DMC results for over 1000 small organic molecules with up to five heavy atoms used as amons and 50 medium-sized organic molecules with nine heavy atoms to validate the AQML predictions. Numerical evidence collected for Δ-AQML models suggests that already modestly sized QMC training data sets of amons suffice to predict total energies with near chemical accuracy throughout chemical space.

9.
J Am Chem Soc ; 145(10): 5899-5908, 2023 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-36862462

RESUMEN

We present an intuitive and general analytical approximation estimating the energy of covalent single and double bonds between participating atoms in terms of their respective nuclear charges with just three parameters, [EAB ≈ a - bZAZB + c(ZA7/3 + ZB7/3) ]. The functional form of our expression models an alchemical atomic energy decomposition between participating atoms A and B. After calibration, reasonably accurate bond dissociation energy estimates are obtained for hydrogen-saturated diatomics composed of p-block elements coming from the same row 2 ≤ n ≤ 4 in the periodic table. Corresponding changes in bond dissociation energies due to substitution of atom B by C can be obtained via simple formulas. While being of different functional form and origin, our model is as simple and accurate as Pauling's well-known electronegativity model. Analysis indicates that the model's response in covalent bonding to variation in nuclear charge is near-linear, which is consistent with Hammett's equation.

10.
J Chem Phys ; 157(22): 221102, 2022 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-36546806

RESUMEN

We use energies and forces predicted within response operator based quantum machine learning (OQML) to perform geometry optimization and transition state search calculations with legacy optimizers but without the need for subsequent re-optimization with quantum chemistry methods. For randomly sampled initial coordinates of small organic query molecules, we report systematic improvement of equilibrium and transition state geometry output as training set sizes increase. Out-of-sample SN2 reactant complexes and transition state geometries have been predicted using the LBFGS and the QST2 algorithms with an root-mean-square deviation (RMSD) of 0.16 and 0.4 Å-after training on up to 200 reactant complex relaxations and transition state search trajectories from the QMrxn20 dataset, respectively. For geometry optimizations, we have also considered relaxation paths up to 5'595 constitutional isomers with sum formula C7H10O2 from the QM9-database. Using the resulting OQML models with an LBFGS optimizer reproduces the minimum geometry with an RMSD of 0.14 Å, only using ∼6000 training points obtained from normal mode sampling along the optimization paths of the training compounds without the need for active learning. For converged equilibrium and transition state geometries, subsequent vibrational normal mode frequency analysis indicates deviation from MP2 reference results by on average 14 and 26 cm-1, respectively. While the numerical cost for OQML predictions is negligible in comparison to density functional theory or MP2, the number of steps until convergence is typically larger in either case. The success rate for reaching convergence, however, improves systematically with training set size, underscoring OQML's potential for universal applicability.


Asunto(s)
Algoritmos , Aprendizaje Automático , Isomerismo
11.
Mater Adv ; 3(22): 8306-8316, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36561279

RESUMEN

Despite their relevance for organic electronics, quantum machine learning (QML) models of molecular electronic properties, such as HOMO-LUMO-gaps, often struggle to achieve satisfying data-efficiency as measured by decreasing prediction errors for increasing training set sizes. We demonstrate that partitioning training sets into different chemical classes prior to training results in independently trained QML models with overall reduced training data needs. For organic molecules drawn from previously published QM7 and QM9-data-sets we have identified and exploited three relevant classes corresponding to compounds containing either aromatic rings and carbonyl groups, or single unsaturated bonds, or saturated bonds The selected QML models of band-gaps (considered at GW and hybrid DFT levels of theory) reach mean absolute prediction errors of ∼0.1 eV for up to an order of magnitude fewer training molecules than for QML models trained on randomly selected molecules. Comparison to Δ-QML models of band-gaps indicates that selected QML exhibit superior data-efficiency. Our findings suggest that selected QML, e.g. based on simple classifications prior to training, could help to successfully tackle challenging quantum property screening tasks of large libraries with high fidelity and low computational burden.

12.
J Chem Phys ; 157(16): 164109, 2022 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-36319406

RESUMEN

We show that the energy of a perturbed system can be fully recovered from the unperturbed system's electron density. We derive an alchemical integral transform by parametrizing space in terms of transmutations, the chain rule, and integration by parts. Within the radius of convergence, the zeroth order yields the energy expansion at all orders, restricting the textbook statement by Wigner that the p-th order wave function derivative is necessary to describe the (2p + 1)-th energy derivative. Without the need for derivatives of the electron density, this allows us to cover entire chemical neighborhoods from just one quantum calculation instead of single systems one by one. Numerical evidence presented indicates that predictive accuracy is achieved in the range of mHa for the harmonic oscillator or the Morse potential and in the range of machine accuracy for hydrogen-like atoms. Considering isoelectronic nuclear charge variations by one proton in all multi-electron atoms from He to Ne, alchemical integral transform based estimates of the relative energy deviate by only few mHa from corresponding Hartree-Fock reference numbers.

13.
J Chem Phys ; 157(2): 024303, 2022 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35840379

RESUMEN

Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules-within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.


Asunto(s)
Aprendizaje Automático , Simulación de Dinámica Molecular , Método de Montecarlo
14.
J Chem Phys ; 156(18): 184801, 2022 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-35568550

RESUMEN

We propose the relaxation of geometries throughout chemical compound space using alchemical perturbation density functional theory (APDFT). APDFT refers to perturbation theory involving changes in nuclear charges within approximate solutions to Schrödinger's equation. We give an analytical formula to calculate the mixed second order energy derivatives with respect to both nuclear charges and nuclear positions (named "alchemical force") within the restricted Hartree-Fock case. We have implemented and studied the formula for its use in geometry relaxation of various reference and target molecules. We have also analyzed the convergence of the alchemical force perturbation series as well as basis set effects. Interpolating alchemically predicted energies, forces, and Hessian to a Morse potential yields more accurate geometries and equilibrium energies than when performing a standard Newton-Raphson step. Our numerical predictions for small molecules including BF, CO, N2, CH4, NH3, H2O, and HF yield mean absolute errors of equilibrium energies and bond lengths smaller than 10 mHa and 0.01 bohr for fourth order APDFT predictions, respectively. Our alchemical geometry relaxation still preserves the combinatorial efficiency of APDFT: Based on a single coupled perturbed Hartree-Fock derivative for benzene, we provide numerical predictions of equilibrium energies and relaxed structures of all 17 iso-electronic charge-neutral BN-doped mutants with averaged absolute deviations of ∼27 mHa and ∼0.12 bohr, respectively.


Asunto(s)
Fenómenos Físicos
15.
J Chem Phys ; 156(11): 114101, 2022 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-35317562

RESUMEN

We introduce an electronic structure based representation for quantum machine learning (QML) of electronic properties throughout chemical compound space. The representation is constructed using computationally inexpensive ab initio calculations and explicitly accounts for changes in the electronic structure. We demonstrate the accuracy and flexibility of resulting QML models when applied to property labels, such as total potential energy, HOMO and LUMO energies, ionization potential, and electron affinity, using as datasets for training and testing entries from the QM7b, QM7b-T, QM9, and LIBE libraries. For the latter, we also demonstrate the ability of this approach to account for molecular species of different charge and spin multiplicity, resulting in QML models that infer total potential energies based on geometry, charge, and spin as input.

16.
Nat Commun ; 12(1): 6047, 2021 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-34663806

RESUMEN

Diels-Alder cycloadditions are efficient routes for the synthesis of cyclic organic compounds. There has been a long-standing discussion whether these reactions proceed via stepwise or concerted mechanisms. Here, we adopt an experimental approach to explore the mechanism of the model polar cycloaddition of 2,3-dibromo-1,3-butadiene with propene ions by probing its conformational specificities in the entrance channel under single-collision conditions in the gas phase. Combining a conformationally controlled molecular beam with trapped ions, we find that both conformers of the diene, gauche and s-trans, are reactive with capture-limited reaction rates. Aided by quantum-chemical and quantum-capture calculations, this finding is rationalised by a simultaneous competition of concerted and stepwise reaction pathways, revealing an interesting mechanistic borderline case.

17.
J Chem Phys ; 155(6): 064105, 2021 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-34391351

RESUMEN

The interplay of kinetics and thermodynamics governs reactive processes, and their control is key in synthesis efforts. While sophisticated numerical methods for studying equilibrium states have well advanced, quantitative predictions of kinetic behavior remain challenging. We introduce a reactant-to-barrier (R2B) machine learning model that rapidly and accurately infers activation energies and transition state geometries throughout the chemical compound space. R2B exhibits improving accuracy as training set sizes grow and requires as input solely the molecular graph of the reactant and the information of the reaction type. We provide numerical evidence for the applicability of R2B for two competing text-book reactions relevant to organic synthesis, E2 and SN2, trained and tested on chemically diverse quantum data from the literature. After training on 1-1.8k examples, R2B predicts activation energies on average within less than 2.5 kcal/mol with respect to the coupled-cluster singles doubles reference within milliseconds. Principal component analysis of kernel matrices reveals the hierarchy of the multiple scales underpinning reactivity in chemical space: Nucleophiles and leaving groups, substituents, and pairwise substituent combinations correspond to systematic lowering of eigenvalues. Analysis of R2B based predictions of ∼11.5k E2 and SN2 barriers in the gas-phase for previously undocumented reactants indicates that on average, E2 is favored in 75% of all cases and that SN2 becomes likely for chlorine as nucleophile/leaving group and for substituents consisting of hydrogen or electron-withdrawing groups. Experimental reaction design from first principles is enabled due to R2B, which is demonstrated by the construction of decision trees. Numerical R2B based results for interatomic distances and angles of reactant and transition state geometries suggest that Hammond's postulate is applicable to SN2, but not to E2.

18.
Chem Rev ; 121(16): 10001-10036, 2021 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-34387476

RESUMEN

Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.


Asunto(s)
Compuestos Inorgánicos/química , Aprendizaje Automático , Compuestos Orgánicos/química , Teoría Cuántica
19.
J Chem Theory Comput ; 17(8): 4872-4890, 2021 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-34260240

RESUMEN

Density functionals are often used in ab initio thermochemistry to provide optimized geometries for single-point evaluations at a high level and to supply estimates of anharmonic zero-point energies (ZPEs). Their use is motivated by relatively high accuracy at a modest computational expense, but a thorough assessment of geometry-related error seems to be lacking. We have benchmarked 53 density functionals, focusing on approximations of the first four rungs and on relatively small basis sets for computational efficiency. Optimized geometries of 279 neutral first-row molecules (H, C, N, O, F) are judged by energy penalties relative to the best available geometries, using the composite model ATOMIC/B5 as energy probe. Only hybrid functionals provide good accuracy with root-mean-square errors around 0.1 kcal/mol and maximum errors below 1.0 kcal/mol, but not all of them do. Conspicuously, first-generation hybrids with few or no empirical parameters tend to perform better than highly parameterized ones. A number of them show good accuracy already with small basis sets (6-31G(d), 6-311G(d)). As is standard practice, anharmonic ZPEs are estimated from scaled harmonic values. Statistics of the latter show less performance variation among functionals than observed for geometry-related error, but they also indicate that ZPE error will generally dominate. We have selected PBE0-D3/6-311G(d) for the next version of the ATOMIC protocol (ATOMIC-2) and studied it in more detail. Empirical expressions have been calibrated to estimate bias corrections and 95% uncertainty intervals for both geometry-related error and scaled ZPEs.

20.
Nat Commun ; 12(1): 4468, 2021 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-34294693

RESUMEN

The computational prediction of atomistic structure is a long-standing problem in physics, chemistry, materials, and biology. Conventionally, force-fields or ab initio methods determine structure through energy minimization, which is either approximate or computationally demanding. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chemical space with atomistic detail. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compound space in order to infer interatomic distances for out-of-sample compounds, effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. The numerical evidence collected includes 3D coordinate predictions for organic molecules, transition states, and crystalline solids. G2S improves systematically with training set size, reaching mean absolute interatomic distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA