INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.

Jiang, Jing; Li, Yachao; Zhang, Ruisheng; Liu, Yunwu

Jiang, Jing; Li, Yachao; Zhang, Ruisheng; Liu, Yunwu.

Afiliación

Jiang J; Key Laboratory of Linguistic and Cultural Computing, Ministry of Education, Northwest Minzu University, Lanzhou 730030, China. Electronic address: jiangj@xbmu.edu.cn.
Li Y; Key Laboratory of Linguistic and Cultural Computing, Ministry of Education, Northwest Minzu University, Lanzhou 730030, China. Electronic address: harry_lyc@foxmail.com.
Zhang R; School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China. Electronic address: zhangrs@lzu.edu.cn.
Liu Y; School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China. Electronic address: liuyw19@lzu.edu.cn.

J Mol Graph Model ; 128: 108703, 2024 05.

Article en En | MEDLINE | ID: mdl-38228013

ABSTRACT

ABSTRACT

Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.

Asunto(s)

Descubrimiento de Drogas; Suministros de Energía Eléctrica; Bases de Datos Factuales

Palabras clave

Contrastive learning; Data augmentation; Molecular property prediction; SMILES

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Suministros de Energía Eléctrica / Descubrimiento de Drogas Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Mol Graph Model Asunto de la revista: BIOLOGIA MOLECULAR Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google