Your browser doesn't support javascript.
loading
POPs identification using simple low-code machine learning.
Xin, Lei; Yu, Haiying; Liu, Sisi; Ying, Guang-Guo; Chen, Chang-Er.
Afiliación
  • Xin L; School of Environment, MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, South China Normal University, Guangzhou 51
  • Yu H; College of Geography and Environmental Sciences, Zhejiang Normal University, Jinhua 321004, China.
  • Liu S; School of Environment, MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, South China Normal University, Guangzhou 51
  • Ying GG; School of Environment, MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, South China Normal University, Guangzhou 51
  • Chen CE; School of Environment, MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, South China Normal University, Guangzhou 51
Sci Total Environ ; 921: 171143, 2024 Apr 15.
Article en En | MEDLINE | ID: mdl-38387592
ABSTRACT
Effectively identifying persistent organic pollutants (POPs) with extensive organic chemical datasets poses a formidable challenge but is of utmost importance. Leveraging machine learning techniques can enhance this process, but previous models often demanded advanced programming skills and high-end computing resources. In this study, we harnessed the simplicity of PyCaret, a Python-based package, to construct machine-learning models for POP screening based on 2D molecular descriptors. We compared the performance of these models against a deep convolutional neural network (DCNN) model. Utilising minimal Python code, we generated several models that exhibited superior or comparable performance to the DCNN. The most outstanding performer, the Light Gradient Boosting Machine (LGBM), achieved an accuracy of 96.20 %, an AUC of 97.70 %, and an F1 score of 82.58 %. This model outshone the DCNN model. Furthermore, it excelled in identifying POPs within the REACH PBT and compiled industrial chemical lists. Our findings highlight the accessibility and simplicity of PyCaret, requiring only a few lines of code, rendering it suitable for non-computing professionals in environmental sciences. The ability of low code machine learning tools (e.g. PyCaret) to facilitate model comparison and interpretation holds promise, encouraging prompt assessment and management of chemical substances.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Sci Total Environ Año: 2024 Tipo del documento: Article Pais de publicación: Países Bajos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Sci Total Environ Año: 2024 Tipo del documento: Article Pais de publicación: Países Bajos