Your browser doesn't support javascript.
loading
Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis.
Mahawan, Tanakamol; Luckett, Teifion; Mielgo Iza, Ainhoa; Pornputtapong, Natapol; Caamaño Gutiérrez, Eva.
Afiliación
  • Mahawan T; Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand.
  • Luckett T; Department of Biochemistry & System Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
  • Mielgo Iza A; Akkhraratchakumari Veterinary College, Walailak University, Nakhon Si Thammarat, Thailand.
  • Pornputtapong N; Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
  • Caamaño Gutiérrez E; Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
BMC Med Inform Decis Mak ; 24(Suppl 4): 175, 2024 Jun 20.
Article en En | MEDLINE | ID: mdl-38902676
ABSTRACT

BACKGROUND:

Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources.

METHODS:

We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA.

RESULTS:

We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub.

CONCLUSION:

This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Neoplasias Pancreáticas / Biomarcadores de Tumor / Carcinoma Ductal Pancreático / Aprendizaje Automático Límite: Humans Idioma: En Revista: BMC Med Inform Decis Mak Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Tailandia Pais de publicación: Reino Unido

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Neoplasias Pancreáticas / Biomarcadores de Tumor / Carcinoma Ductal Pancreático / Aprendizaje Automático Límite: Humans Idioma: En Revista: BMC Med Inform Decis Mak Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Tailandia Pais de publicación: Reino Unido