Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Acta Biomater ; 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39299620

RESUMEN

We introduce a data-driven framework to automatically identify interpretable and physically meaningful hyperelastic constitutive models from sparse data. Leveraging symbolic regression, our approach generates elegant hyperelastic models that achieve accurate data fitting with parsimonious mathematic formulas, while strictly adhering to hyperelasticity constraints such as polyconvexity/ellipticity. Our investigation spans three distinct hyperelastic models-invariant-based, principal stretch-based, and normal strain-based-and highlights the versatility of symbolic regression. We validate our new approach using synthetic data from five classic hyperelastic models and experimental data from the human brain cortex to demonstrate algorithmic efficacy. Our results suggest that our symbolic regression algorithms robustly discover accurate models with succinct mathematic expressions in invariant-based, stretch-based, and strain-based scenarios. Strikingly, the strain-based model exhibits superior accuracy, while both stretch-based and strain-based models effectively capture the nonlinearity and tension-compression asymmetry inherent to the human brain tissue. Polyconvexity/ellipticity assessment affirm the rigorous adherence to convexity requirements both within and beyond the training regime. However, the stretch-based models raise concerns regarding potential convexity loss under large deformations. The evaluation of predictive capabilities demonstrates remarkable interpolation capabilities for all three models and acceptable extrapolation performance for stretch-based and strain-based models. Finally, robustness tests on noise-embedded data underscore the reliability of our symbolic regression algorithms. Our study confirms the applicability and accuracy of symbolic regression in the automated discovery of isotropic hyperelastic models for the human brain and gives rise to a wide variety of applications in other soft matter systems. STATEMENT OF SIGNIFICANCE: Our research introduces a pioneering data-driven framework that revolutionizes the automated identification of hyperelastic constitutive models, particularly in the context of soft matter systems such as the human brain. By harnessing the power of symbolic regression, we have unlocked the ability to distill intricate physical phenomena into elegant and interpretable mathematical expressions. Our approach not only ensures accurate fitting to sparse data but also upholds crucial hyperelasticity constraints, including polyconvexity, essential for maintaining physical relevance.

2.
Sci Rep ; 14(1): 19422, 2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39169100

RESUMEN

Steel construction is increasingly using thin-walled profiles to achieve lighter, more cost-effective structures. However, analyzing the behavior of these elements becomes very complex due to the combined effects of local buckling in the thin walls and overall global buckling of the entire column. These factors make traditional analytical methods difficult to apply. Hence, in this research work, the strength of bi-axially loaded track and channel cold formed composite column has been estimated by applying three AI-based symbolic regression techniques namely (GP), (EPR) and (GMDH-NN). These techniques were selected because their output models are closed form equations that could be manually used. The methodology began with collecting a 90 records database from previous researches and conducting statistical, correlation and sensitivity analysis, and then the database was used to train and validate the three models. All the models used local and global slenderness ratios (λ, λc, λt) and relative eccentricities (ex/D, ey/B) as inputs and (F/Fy) as output. The performances of the developed models were compared with the predicted capacities from two design codes (AISI and EC3). The results showed that both design codes have prediction error of 33% while the three developed models showed better performance with error percent of 6%, and the (EPR) model is the simplest one. Also, both correlation and sensitivity analysis showed that the global slenderness ratio (λ) has the main influence on the strength, then the relative eccentricities (ex/D, ey/B) and finally the local slenderness ratios (λc, λt).

3.
Polymers (Basel) ; 16(16)2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39204546

RESUMEN

The extensive use of polypropylene (PP) in various industries has heightened interest in developing efficient methods for recycling and optimising its mixtures. This study focuses on formulating predictive models for the Melt Flow Rate (MFR) and shear viscosity of PP blends. The investigation involved characterising various grades, including virgin homopolymers, copolymers, and post-consumer recyclates, in accordance with ISO 1133 standards. The research examined both binary and ternary blends, utilising traditional mixing rules and symbolic regression to predict rheological properties. High accuracy was achieved with the Arrhenius and Cragoe models, attaining R2 values over 0.99. Symbolic regression further enhanced these models, offering significant improvements. To mitigate overfitting, empirical noise and variable swapping were introduced, increasing the models' robustness and generalisability. The results demonstrated that the developed models could reliably predict MFR and shear viscosity, providing a valuable tool for improving the quality and consistency of PP mixtures. These advancements support the development of recycling technologies and sustainable practices in the polymer industry by optimising processing and enhancing the use of recycled materials.

4.
Sci Rep ; 14(1): 15308, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38961241

RESUMEN

It has been imperative to study and stabilize cohesive soils for use in the construction of pavement subgrade and compacted landfill liners considering their unconfined compressive strength (UCS). As long as natural cohesive soil falls below 200 kN/m2 in strength, there is a structural necessity to improve its mechanical property to be suitable for the intended structural purposes. Subgrades and landfills are important environmental geotechnics structures needing the attention of engineering services due to their role in protecting the environment from associated hazards. In this research project, a comparative study and suitability assessment of the best analysis has been conducted on the behavior of the unconfined compressive strength (UCS) of cohesive soil reconstituted with cement and lime and mechanically stabilized at optimal compaction using multiple ensemble-based machine learning classification and symbolic regression techniques. The ensemble-based ML classification techniques are the gradient boosting (GB), CN2, naïve bayes (NB), support vector machine (SVM), stochastic gradient descent (SGD), k-nearest neighbor (K-NN), decision tree (Tree) and random forest (RF) and the artificial neural network (ANN) and response surface methodology (RSM) to estimate the (UCS, MPa) of cohesive soil stabilized with cement and lime. The considered inputs were cement (C), lime (Li), liquid limit (LL), plasticity index (PI), optimum moisture content (OMC), and maximum dry density (MDD). A total of 190 mix entries were collected from experimental exercises and partitioned into 74-26% train-test dataset. At the end of the model exercises, it was found that both GB and K-NN models showed the same excellent accuracy of 95%, while CN2, SVM, and Tree models shared the same level of accuracy of about 90%. RF and SGD models showed fair accuracy level of about 65-80% and finally (NB) badly producing an unacceptable low accuracy of 13%. The ANN and the RSM also showed closely matched accuracy to the SVM and the Tree. Both of correlation matrix and sensitivity analysis indicated that UCS is greatly affected by MDD, then the consistency limits and cement content, and lime content comes in the third place while the impact of (OMC) is almost neglected. This outcome can be applied in the field to obtain optimal compacted for a lime reconstituted soil considering the almost negligible impact of compactive moisture.

5.
Sci Rep ; 14(1): 14590, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918511

RESUMEN

This study explores machine learning (ML) capabilities for predicting the shear strength of reinforced concrete deep beams (RCDBs). For this purpose, eight typical machine-learning models, i.e., symbolic regression (SR), XGBoost (XGB), CatBoost (CATB), random forest (RF), LightGBM, support vector regression (SVR), artificial neural networks (ANN), and Gaussian process regression (GPR) models, are selected and compared based on a database of 840 samples with 14 input features. The hyperparameter tuning of the introduced ML models is performed using the Bayesian optimization (BO) technique. The comparison results show that the CatBoost model is the most reliable and accurate ML model (R2 = 0.997 and 0.947 in the training and testing sets, respectively). In addition, simple and practical design expressions for RCDBs have been proposed based on the SR model with a physical meaning and acceptable accuracy (an average prediction-to-test ratio of 0.935 and a standard deviation of 0.198). Meanwhile, the shear strength predicted by ML models was then compared with classical mechanics-driven shear models, including two prominent practice codes (i.e., ACI318, EC2) and two previous mechanical models, which indicated that the ML approach is highly reliable and accurate over conventional methods. In addition, a reliability-based design was conducted on two ML models, and their reliability results were compared with those of two code standards. The findings revealed that the ML models demonstrate higher reliability compared to code standards.

6.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38886006

RESUMEN

Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Perfilación de la Expresión Génica/métodos , Humanos , Transcriptoma , Programas Informáticos , Biología Computacional/métodos , Redes Neurales de la Computación
7.
Sci Rep ; 14(1): 11169, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38750117

RESUMEN

We present a new method for approximating two-body interatomic potentials from existing ab initio data based on representing the unknown function as an analytic continued fraction. In this study, our method was first inspired by a representation of the unknown potential as a Dirichlet polynomial, i.e., the partial sum of some terms of a Dirichlet series. Our method allows for a close and computationally efficient approximation of the ab initio data for the noble gases Xenon (Xe), Krypton (Kr), Argon (Ar), and Neon (Ne), which are proportional to r - 6 and to a very simple d e p t h = 1 truncated continued fraction with integer coefficients and depending on n - r only, where n is a natural number (with n = 13 for Xe, n = 16 for Kr, n = 17 for Ar, and n = 27 for Neon). For Helium (He), the data is well approximated with a function having only one variable n - r with n = 31 and a truncated continued fraction with d e p t h = 2 (i.e., the third convergent of the expansion). Also, for He, we have found an interesting d e p t h = 0 result, a Dirichlet polynomial of the form k 1 6 - r + k 2 48 - r + k 3 72 - r (with k 1 , k 2 , k 3 all integers), which provides a surprisingly good fit, not only in the attractive but also in the repulsive region. We also discuss lessons learned while facing the surprisingly challenging non-linear optimisation tasks in fitting these approximations and opportunities for parallelisation.

8.
J R Soc Interface ; 21(212): 20230710, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38503338

RESUMEN

In the human cardiovascular system (CVS), the interaction between the left and right ventricles of the heart is influenced by the septum and the pericardium. Computational models of the CVS can capture this interaction, but this often involves approximating solutions to complex nonlinear equations numerically. As a result, numerous models have been proposed, where these nonlinear equations are either simplified, or ventricular interaction is ignored. In this work, we propose an alternative approach to modelling ventricular interaction, using a hybrid neural ordinary differential equation (ODE) structure. First, a lumped parameter ODE model of the CVS (including a Newton-Raphson procedure as the numerical solver) is simulated to generate synthetic time-series data. Next, a hybrid neural ODE based on the same model is constructed, where ventricular interaction is instead set to be governed by a neural network. We use a short range of the synthetic data (with various amounts of added measurement noise) to train the hybrid neural ODE model. Symbolic regression is used to convert the neural network into analytic expressions, resulting in a partially learned mechanistic model. This approach was able to recover parsimonious functions with good predictive capabilities and was robust to measurement noise.


Asunto(s)
Ventrículos Cardíacos , Redes Neurales de la Computación , Humanos , Simulación por Computador
9.
J Pharmacokinet Pharmacodyn ; 51(2): 155-167, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37864654

RESUMEN

Efficiently finding covariate model structures that minimize the need for random effects to describe pharmacological data is challenging. The standard approach focuses on identification of relevant covariates, and present methodology lacks tools for automatic identification of covariate model structures. Although neural networks could potentially be used to approximate covariate-parameter relationships, such approximations are not human-readable and come at the risk of poor generalizability due to high model complexity.In the present study, a novel methodology for the simultaneous selection of covariate model structure and optimization of its parameters is proposed. It is based on symbolic regression, posed as an optimization problem with a smooth loss function. This enables training of the model through back-propagation using efficient gradient computations.Feasibility and effectiveness are demonstrated by application to a clinical pharmacokinetic data set for propofol, containing infusion and blood sample time series from 1031 individuals. The resulting model is compared to a published state-of-the-art model for the same data set. Our methodology finds a covariate model structure and corresponding parameter values with a slightly better fit, while relying on notably fewer covariates than the state-of-the-art model. Unlike contemporary practice, finding the covariate model structure is achieved without an iterative procedure involving manual interactions.


Asunto(s)
Redes Neurales de la Computación , Propofol , Humanos , Factores de Tiempo
10.
Evol Comput ; 32(1): 49-68, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-36893327

RESUMEN

Reproducibility is important for having confidence in evolutionary machine learning algorithms. Although the focus of reproducibility is usually to recreate an aggregate prediction error score using fixed random seeds, this is not sufficient. Firstly, multiple runs of an algorithm, without a fixed random seed, should ideally return statistically equivalent results. Secondly, it should be confirmed whether the expected behaviour of an algorithm matches its actual behaviour, in terms of how an algorithm targets a reduction in prediction error. Confirming the behaviour of an algorithm is not possible when using a total error aggregate score. Using an error decomposition framework as a methodology for improving the reproducibility of results in evolutionary computation addresses both of these factors. By estimating decomposed error using multiple runs of an algorithm and multiple training sets, the framework provides a greater degree of certainty about the prediction error. Also, decomposing error into bias, variance due to the algorithm (internal variance), and variance due to the training data (external variance) more fully characterises evolutionary algorithms. This allows the behaviour of an algorithm to be confirmed. Applying the framework to a number of evolutionary algorithms shows that their expected behaviour can be different to their actual behaviour. Identifying a behaviour mismatch is important in terms of understanding how to further refine an algorithm as well as how to effectively apply an algorithm to a problem.


Asunto(s)
Algoritmos , Aprendizaje Automático , Reproducibilidad de los Resultados
11.
Materials (Basel) ; 16(23)2023 Nov 26.
Artículo en Inglés | MEDLINE | ID: mdl-38068098

RESUMEN

Most failures in steel materials are due to fatigue damage, so it is of great significance to analyze the key features of fatigue strength (FS) in order to improve fatigue performance. This study collected data on the fatigue strength of steel materials and established a predictive model for FS based on machine learning (ML). Three feature-construction strategies were proposed based on the dataset, and compared on four typical ML algorithms. The combination of Strategy Ⅲ (composition, heat-treatment, and atomic features) and the GBT algorithm showed the best performance. Subsequently, input features were selected step by step using methods such as the analysis of variance (ANOVA), embedded method, recursive method, and exhaustive method. The key features affecting FS were found to be TT, mE, APID, and Mo. Based on these key features and Bayesian optimization, an ML model was established, which showed a good performance. Finally, Shapley additive explanations (SHAP) and symbolic regression (SR) are introduced to improve the interpretability of the prediction model. It had been discovered through SHAP analysis that TT and Mo had the most significant impact on FS. Specifically, it was observed that 160 < TT < 500 and Mo > 0.15 was beneficial for increasing the value of FS. SR was used to establish a significant mathematical relationship between these key features and FS.

12.
Proc Natl Acad Sci U S A ; 120(48): e2306275120, 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-37983488

RESUMEN

Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern-matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery and paves the way for more accurate rogue wave forecasting.

13.
Biomolecules ; 13(10)2023 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-37892198

RESUMEN

Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employed a symbolic regression algorithm to pinpoint highly relevant, yet minimally redundant models and features for inferring a cell type's disease state based on its transcriptomic profile. We ascertained the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. The validation was carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach's efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Humanos , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Biomarcadores
14.
ACS Appl Mater Interfaces ; 15(34): 40419-40427, 2023 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-37594363

RESUMEN

The band gap of hybrid organic-inorganic perovskites (HOIPs) is a key factor affecting the light absorption characteristics and thus the performance of perovskite solar cells (PSCs). However, band gap engineering, using experimental trial and error and high-throughput density functional theory calculations, is blind and costly. Therefore, it is critical to statistically identify the multiple factors influencing band gaps and to rationally design perovskites with targeted band gaps. This study combined feature engineering, the gradient-boosted regression tree (GBRT) algorithm, and the genetic algorithm-based symbolic regression (GASR) algorithm to develop an interpretable machine learning (ML) strategy for predicting the band gap of HOIPs accurately and quantitatively interpreting the factors affecting the band gap. Seven best physical features were selected to construct a GBRT model with a root-mean-square error of less than 0.060 eV, and the most important feature is the electronegativity difference between the B-site and the X-site (χB-X). Further, a mathematical formula (Eg = χB-X2 + 0.881χB-X) was constructed with GASR for a quantitative interpretation of the band gap influence patterns. According to the ML model, the HOIP MA0.23FA0.02Cs0.75Pb0.59Sn0.41Br0.24I2.76 was obtained, with a suitable band gap of 1.39 eV. Our proposed interpretable ML strategy provides an effective approach for developing HOIP structures with targeted band gaps, which can also be applied to other material fields.

15.
Neural Netw ; 165: 1021-1034, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37467584

RESUMEN

Symbolic regression (SR) can be utilized to unveil the underlying mathematical expressions that describe a given set of observed data. At present, SR can be categorized into two methods: learning-from-scratch and learning-with-experience. Compared to learning-from-scratch, learning-with-experience yields results that are comparable to those of several benchmarks and incurs significantly lower time costs for obtaining expressions. However, the learning-with-experience model performs poorly in terms of unseen data distributions and lacks a rectification tool, apart from constant optimization, which exhibits limited performance. In this study, we propose a Symbolic Network-based Rectifiable Learning Framework (SNR) that possesses the ability to correct errors. SNR adopts Symbolic Network (SymNet) to represent an expression, and the encoding of SymNet is designed to provide supervised information, with numerous self-generated expressions, to train a policy net (PolicyNet). The training of PolicyNet can offer prior knowledge to guide effective searches. Subsequently, the incorrectly predicted expressions are revised via a rectification mechanism. This rectification mechanism endows SNR with broader applicability. Experimental results demonstrate that our proposed method achieves the highest averaged coefficient of determination on self-generated datasets when compared with other state-of-the-art methods and yields more accurate results in public datasets.


Asunto(s)
Benchmarking , Aprendizaje , Conocimiento , Políticas
16.
Front Psychiatry ; 14: 1199113, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37426104

RESUMEN

Autism, a neurodevelopmental disorder, presents significant challenges for diagnosis and classification. Despite the widespread use of neural networks in autism classification, the interpretability of their models remains a crucial issue. This study aims to address this concern by investigating the interpretability of neural networks in autism classification using the deep symbolic regression and brain network interpretative methods. Specifically, we analyze publicly available autism fMRI data using our previously developed Deep Factor Learning model on a Hibert Basis tensor (HB-DFL) method and extend the interpretative Deep Symbolic Regression method to identify dynamic features from factor matrices, construct brain networks from generated reference tensors, and facilitate the accurate diagnosis of abnormal brain network activity in autism patients by clinicians. Our experimental results show that our interpretative method effectively enhances the interpretability of neural networks and identifies crucial features for autism classification.

17.
Environ Sci Pollut Res Int ; 30(37): 87071-87086, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37418189

RESUMEN

Carbon emission (CE) has led to increasingly severe climate problems. The key to reducing CE is to identify the dominant influencing factors and explore their influence degree. The CE data of 30 provinces from 1997 to 2020 in China were calculated by IPCC method. Based on this, the importance order of six factors included GDP, Industrial Structure (IS), Total Population (TP), Population Structure (PS), Energy Intensity (EI) and Energy Structure (ES) affecting the CE of China's provinces were obtained by using symbolic regression, then the LMDI and the Tapio models were established to deeply explore the influence degree of different factors on CE. The results showed that the 30 provinces were divided into five categories according to the primary factor, GDP was the most important factor, followed by ES and EI, then IS, and the least TP and PS. The growth of per capita GDP promoted the increase of CE, while reduced EI inhibited the increase of CE. The increase of ES promoted CE in some provinces but inhibited in others. The increase of TP weakly promoted the increase of CE. These results can provide some references for governments to formulate relevant CE reduction policies under dual carbon goal.


Asunto(s)
Dióxido de Carbono , Carbono , Carbono/análisis , Dióxido de Carbono/análisis , China , Industrias , Desarrollo Económico
18.
J Mol Graph Model ; 124: 108530, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37321063

RESUMEN

Data-driven methods are receiving significant attention in recent years for chemical and materials researches; however, more works should be done to leverage the new paradigm to model and analyze the adsorption of the organic molecules on low-dimensional surfaces beyond using the traditional simulation methods. In this manuscript, we employ machine learning and symbolic regression method coupled with DFT calculations to investigate the adsorption of atmospheric organic molecules on a low-dimensional metal oxide mineral system. The starting dataset consisting of the atomic structures of the organic/metal oxide interfaces are obtained via the density functional theory (DFT) calculation and different machine learning algorithms are compared, with the random forest algorithm achieving high accuracies for the target output. The feature ranking step identifies that the polarizability and bond type of the organic adsorbates are the key descriptors for the adsorption energy output. In addition, the symbolic regression coupled with genetic programming automatically identifies a series of hybrid new descriptors displaying improved relevance with the target output, suggesting the viability of symbolic regression to complement the traditional machine learning techniques for the descriptor design and fast modeling purposes. This manuscript provides a framework for effectively modeling and analyzing the adsorption of the organic molecules on low-dimensional surfaces via comprehensive data-driven approaches.


Asunto(s)
Algoritmos , Metales , Adsorción , Metales/química , Compuestos Orgánicos , Aprendizaje Automático , Óxidos
19.
PeerJ Comput Sci ; 9: e1241, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37346583

RESUMEN

There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. Widespread deep neural networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). We suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.

20.
Environ Sci Technol ; 57(46): 18317-18328, 2023 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-37186812

RESUMEN

Machine learning (ML) models were developed for understanding the root uptake of per- and polyfluoroalkyl substances (PFASs) under complex PFAS-crop-soil interactions. Three hundred root concentration factor (RCF) data points and 26 features associated with PFAS structures, crop properties, soil properties, and cultivation conditions were used for the model development. The optimal ML model, obtained by stratified sampling, Bayesian optimization, and 5-fold cross-validation, was explained by permutation feature importance, individual conditional expectation plot, and 3D interaction plot. The results showed that soil organic carbon contents, pH, chemical logP, soil PFAS concentration, root protein contents, and exposure time greatly affected the root uptake of PFASs with 0.43, 0.25, 0.10, 0.05, 0.05, and 0.05 of relative importance, respectively. Furthermore, these factors presented the key threshold ranges in favor of the PFAS uptake. Carbon-chain length was identified as the critical molecular structure affecting root uptake of PFASs with 0.12 of relative importance, based on the extended connectivity fingerprints. A user-friendly model was established with symbolic regression for accurately predicting RCF values of the PFASs (including branched PFAS isomerides). The present study provides a novel approach for profound insight into the uptake of PFASs by crops under complex PFAS-crop-soil interactions, aiming to ensure food safety and human health.


Asunto(s)
Fluorocarburos , Contaminantes Químicos del Agua , Humanos , Suelo/química , Carbono , Teorema de Bayes , Fluorocarburos/análisis , Aprendizaje Automático , Contaminantes Químicos del Agua/análisis
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA