INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.
J Mol Graph Model
; 128: 108703, 2024 05.
Article
en En
| MEDLINE
| ID: mdl-38228013
ABSTRACT
Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Suministros de Energía Eléctrica
/
Descubrimiento de Drogas
Tipo de estudio:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
J Mol Graph Model
Asunto de la revista:
BIOLOGIA MOLECULAR
Año:
2024
Tipo del documento:
Article
Pais de publicación:
Estados Unidos