Búsqueda | Portal Regional de la BVS

Medical-informed machine learning: integrating prior knowledge into medical decision systems.

Sirocchi, Christel; Bogliolo, Alessandro; Montagna, Sara.

BMC Med Inform Decis Mak ; 24(Suppl 4): 186, 2024 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-38943085

RESUMEN

BACKGROUND: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML. METHODS: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models. RESULTS: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios. CONCLUSIONS: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Automático , Humanos

Exploring machine learning for untargeted metabolomics using molecular fingerprints.

Sirocchi, Christel; Biancucci, Federica; Donati, Matteo; Bogliolo, Alessandro; Magnani, Mauro; Menotta, Michele; Montagna, Sara.

Comput Methods Programs Biomed ; 250: 108163, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38626559

RESUMEN

BACKGROUND: Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. METHODS: This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. RESULTS: The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. CONCLUSION: In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Asunto(s)

Aprendizaje Automático , Metabolómica , Metabolómica/métodos , Humanos , Línea Celular , Ataxia Telangiectasia/metabolismo , Hipoxia de la Célula/fisiología

Topological network features determine convergence rate of distributed average algorithms.

Sirocchi, Christel; Bogliolo, Alessandro.

Sci Rep ; 12(1): 21831, 2022 Dec 17.

Artículo en Inglés | MEDLINE | ID: mdl-36528734

RESUMEN

Gossip algorithms are message-passing schemes designed to compute averages and other global functions over networks through asynchronous and randomised pairwise interactions. Gossip-based protocols have drawn much attention for achieving robust and fault-tolerant communication while maintaining simplicity and scalability. However, the frequent propagation of redundant information makes them inefficient and resource-intensive. Most previous works have been devoted to deriving performance bounds and developing faster algorithms tailored to specific structures. In contrast, this study focuses on characterising the effect of topological network features on performance so that faster convergence can be engineered by acting on the underlying network rather than the gossip algorithm. The numerical experiments identify the topological limiting factors, the most predictive graph metrics, and the most efficient algorithms for each graph family and for all graphs, providing guidelines for designing and maintaining resource-efficient networks. Regression analyses confirm the explanatory power of structural features and demonstrate the validity of the topological approach in performance estimation. Finally, the high predictive capabilities of local metrics and the possibility of computing them in a distributed manner and at a low computational cost inform the design and implementation of a novel distributed approach for predicting performance from the network topology.

A Study on the Influence of Speed on Road Roughness Sensing: The SmartRoadSense Case.

Alessandroni, Giacomo; Carini, Alberto; Lattanzi, Emanuele; Freschi, Valerio; Bogliolo, Alessandro.

Sensors (Basel) ; 17(2)2017 Feb 07.

Artículo en Inglés | MEDLINE | ID: mdl-28178224

RESUMEN

SmartRoadSense is a crowdsensing project aimed at monitoring the conditions of the road surface. Using the sensors of a smartphone, SmartRoadSense monitors the vertical accelerations inside a vehicle traveling the road and extracts a roughness index conveying information about the road conditions. The roughness index and the smartphone GPS data are periodically sent to a central server where they are processed, associated with the specific road, and aggregated with data measured by other smartphones. This paper studies how the smartphone vertical accelerations and the roughness index are related to the vehicle speed. It is shown that the dependence can be locally approximated with a gamma (power) law. Extensive experimental results using data extracted from SmartRoadSense database confirm the gamma law relationship between the roughness index and the vehicle speed. The gamma law is then used for improving the SmartRoadSense data aggregation accounting for the effect of vehicle speed.

Genome-wide computational approach for the prediction of duplications generating protein localization signals.

Freschi, Valerio; Bogliolo, Alessandro; Liso, Arcangelo.

Comput Biol Med ; 42(11): 1091-7, 2012 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-23017829

RESUMEN

Investigating the possible generation of motifs accountable for aberrant protein dislocation subsequent to the rise of short tandem duplications is interesting, given the pathogenic potential of this mechanism, as demonstrated in diseases such adult myeloid leukemia (AML). In this paper we introduce a new computational method for predicting genomic points which, after hypothetical mutation events such as micro-duplications, might encode molecular patterns such as localization or export signals. The proposed framework allows to study motifs of unconstrained length defined as regular expressions at a genome-wide level, providing an in silico platform capable of analyzing the potential effect of duplications on abnormal cellular localization.

Asunto(s)

Duplicación de Gen , Genómica/métodos , Señales de Clasificación de Proteína/genética , Secuencias Repetidas en Tándem , Algoritmos , Simulación por Computador , Humanos , Modelos Genéticos , Mutación , Reconocimiento de Normas Patrones Automatizadas , Proteínas/genética , Proteínas/metabolismo

A lossy compression technique enabling duplication-aware sequence alignment.

Freschi, Valerio; Bogliolo, Alessandro.

Evol Bioinform Online ; 8: 171-80, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22518086

RESUMEN

In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment.

A monte carlo method for assessing the quality of duplication-aware alignment algorithms.

Freschi, Valerio; Bogliolo, Alessandro.

Evol Bioinform Online ; 7: 31-40, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21698090

RESUMEN

The increasing availability of high throughput sequencing technologies poses several challenges concerning the analysis of genomic data. Within this context, duplication-aware sequence alignment taking into account complex mutation events is regarded as an important problem, particularly in light of recent evolutionary bioinformatics researches that highlighted the role of tandem duplications as one of the most important mutation events. Traditional sequence comparison algorithms do not take into account these events, resulting in poor alignments in terms of biological significance, mainly because of their assumption of statistical independence among contiguous residues. Several duplication-aware algorithms have been proposed in the last years which differ either for the type of duplications they consider or for the methods adopted to identify and compare them. However, there is no solution which clearly outperforms the others and no methods exist for assessing the reliability of the resulting alignments. This paper proposes a Monte Carlo method for assessing the quality of duplication-aware alignment algorithms and for driving the choice of the most appropriate alignment technique to be used in a specific context.The applicability and usefulness of the proposed approach are demonstrated on a case study, namely, the comparison of alignments based on edit distance with or without repeat masking.

Using sequence compression to speedup probabilistic profile matching.

Freschi, Valerio; Bogliolo, Alessandro.

Bioinformatics ; 21(10): 2225-9, 2005 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-15713733

RESUMEN

MOTIVATION: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N. RESULTS: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.

Asunto(s)

Algoritmos , Compresión de Datos/métodos , Perfilación de la Expresión Génica/métodos , Modelos Químicos , Modelos Estadísticos , Alineación de Secuencia/métodos , Análisis de Secuencia/métodos , Homología de Secuencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA