Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 544
Filtrar
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39293804

RESUMEN

Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.


Asunto(s)
Aprendizaje Profundo , Genómica , Genómica/métodos , Humanos , Redes Neurales de la Computación , Biología Computacional/métodos
2.
ACS Nano ; 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39288200

RESUMEN

DNA-stabilized silver nanoclusters (AgN-DNAs) have sequence-tuned compositions and fluorescence colors. High-throughput experiments together with supervised machine learning models have recently enabled design of DNA templates that select for AgN-DNA properties, including near-infrared (NIR) emission that holds promise for deep tissue bioimaging. However, these existing models do not enable simultaneous selection of multiple AgN-DNA properties, and require significant expert input for feature engineering and class definitions. This work presents a model for multiobjective, continuous-property design of AgN-DNAs with automatic feature extraction, based on variational autoencoders (VAEs). This model is generative, i.e., it learns both the forward mapping from DNA sequence to AgN-DNA properties and the inverse mapping from properties to sequence, and is trained on an experimental data set of DNA sequences paired with AgN-DNA fluorescence properties. Experimental testing shows that the model enables effective design of AgN-DNA emission, including bright NIR AgN-DNAs with 4-fold greater abundance compared to training data. In addition, Shapley analysis is employed to discern learned nucleobase patterns that correspond to fluorescence color and brightness. This generative model can be adapted for a range of biomolecular systems with sequence-dependent properties, enabling precise design of emerging biomolecular nanomaterials.

3.
BMC Pulm Med ; 24(1): 447, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39272037

RESUMEN

BACKGROUND: Pneumonia, a leading cause of morbidity and mortality worldwide, often necessitates Intensive Care Unit (ICU) admission. Accurate prediction of pneumonia mortality is crucial for tailored prevention and treatment plans. However, existing mortality prediction models face limited adoption in clinical practice due to their lack of interpretability. OBJECTIVE: This study aimed to develop an interpretable model for predicting pneumonia mortality in ICUs. Leveraging the Shapley Additive Explanation (SHAP) method, we sought to elucidate the Extreme Gradient Boosting (XGBoost) model and identify prognostic factors for pneumonia. METHODS: Conducted as a retrospective cohort study, we utilized electronic health records from the eICU-CRD (2014-2015) for all adult pneumonia patients. The first 24 h of each ICU admission records were considered, with 70% of the dataset allocated for model training and 30% for validation. The XGBoost model was employed, and performance was assessed using the area under the receiver operating characteristic curve (AUC). The SHAP method provided insights into the XGBoost model. RESULTS: Among 10,962 pneumonia patients, in-hospital mortality was 16.33%. The XGBoost model demonstrated superior predictive performance (AUC: 0.778 ± 0.016)) compared to traditional scoring systems and other machine learning method, which achieved an improvement of 10% points. SHAP analysis identified Aspartate Aminotransferase (AST) as the most crucial predictor. CONCLUSIONS: Interpretable predictive models enhance mortality risk assessment for pneumonia patients in the ICU, fostering transparency. AST emerged as the foremost predictor, followed by patient age, albumin, BMI et al. These insights, rooted in strong correlations with mortality, facilitate improved clinical decision-making and resource allocation.


Asunto(s)
Mortalidad Hospitalaria , Unidades de Cuidados Intensivos , Neumonía , Humanos , Unidades de Cuidados Intensivos/estadística & datos numéricos , Neumonía/mortalidad , Estudios Retrospectivos , Masculino , Femenino , Anciano , Persona de Mediana Edad , Pronóstico , Curva ROC , Medición de Riesgo/métodos , Aprendizaje Automático , Anciano de 80 o más Años , Factores de Riesgo , Adulto
4.
Front Neuroinform ; 18: 1451529, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39247901

RESUMEN

Introduction: Gait analysis, an expanding research area, employs non-invasive sensors and machine learning techniques for a range of applications. In this study, we investigate the impact of cognitive decline conditions on gait performance, drawing connections between gait deterioration in Parkinson's Disease (PD) and healthy individuals dual tasking. Methods: We employ Explainable Artificial Intelligence (XAI) specifically Layer-Wise Relevance Propagation (LRP), in conjunction with Convolutional Neural Networks (CNN) to interpret the intricate patterns in gait dynamics influenced by cognitive loads. Results: We achieved classification accuracies of 98% F1 scores for PD dataset and 95.5% F1 scores for the combined PD dataset. Furthermore, we explore the significance of cognitive load in healthy gait analysis, resulting in robust classification accuracies of 90% ± 10% F1 scores for subject cognitive load verification. Our findings reveal significant alterations in gait parameters under cognitive decline conditions, highlighting the distinctive patterns associated with PD-related gait impairment and those induced by multitasking in healthy subjects. Through advanced XAI techniques (LRP), we decipher the underlying features contributing to gait changes, providing insights into specific aspects affected by cognitive decline. Discussion: Our study establishes a novel perspective on gait analysis, demonstrating the applicability of XAI in elucidating the shared characteristics of gait disturbances in PD and dual-task scenarios in healthy individuals. The interpretability offered by XAI enhances our ability to discern subtle variations in gait patterns, contributing to a more nuanced comprehension of the factors influencing gait dynamics in PD and dual-task conditions, emphasizing the role of XAI in unraveling the intricacies of gait control.

5.
Biol Methods Protoc ; 9(1): bpae063, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39258158

RESUMEN

Deep learning applications in taxonomic classification for animals and plants from images have become popular, while those for microorganisms are still lagging behind. Our study investigated the potential of deep learning for the taxonomic classification of hundreds of filamentous fungi from colony images, which is typically a task that requires specialized knowledge. We isolated soil fungi, annotated their taxonomy using standard molecular barcode techniques, and took images of the fungal colonies grown in petri dishes (n = 606). We applied a convolutional neural network with multiple training approaches and model architectures to deal with some common issues in ecological datasets: small amounts of data, class imbalance, and hierarchically structured grouping. Model performance was overall low, mainly due to the relatively small dataset, class imbalance, and the high morphological plasticity exhibited by fungal colonies. However, our approach indicates that morphological features like color, patchiness, and colony extension rate could be used for the recognition of fungal colonies at higher taxonomic ranks (i.e. phylum, class, and order). Model explanation implies that image recognition characters appear at different positions within the colony (e.g. outer or inner hyphae) depending on the taxonomic resolution. Our study suggests the potential of deep learning applications for a better understanding of the taxonomy and ecology of filamentous fungi amenable to axenic culturing. Meanwhile, our study also highlights some technical challenges in deep learning image analysis in ecology, highlighting that the domain of applicability of these methods needs to be carefully considered.

6.
Sci Total Environ ; 953: 176125, 2024 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-39260489

RESUMEN

With climate warming and accelerated urbanisation, severe urban flooding has become a common problem worldwide. Frequent extreme rainfall events and the siltation of drainage pipes further increase the burden on urban drainage networks. However, existing studies have not fully considered the effects of rainfall and pipeline siltation on the response characteristics of flooding when constructing numerical models of urban flooding simulations. To solve this problem, a surface-subsurface coupling model was constructed by combining the Saint-Venant equation, Manning equation, a one-dimensional pipeline model (SWMM), and a two-dimensional surface overflow model (LISFLOOD-FP). Then, the SWMM model considering pipeline siltation and the two-dimensional surface overflow model (LISFLOOD-FP) were coupled with the flow exchange governing equation, and the urban flooding response characteristics considering the coupling effect of "rainfall and drainage pipeline siltation" were analysed. To enhance the solvability of waterlogging prediction, an intelligent prediction model of urban flooding based on Bayes-CNN-BLSTM was established by combining a convolutional neural network (CNN), bidirectional long short-term memory neural network (BLSTM), Bayesian optimisation (Bayes), and an interpretable loss function error correction method. The actual rainfall events and flooding processes recorded by the monitoring equipment at Huizhou University were used to calibrate and verify the model. The results show that in the Rainfall 1 and Rainfall 2 scenarios, the overload rates of the pipelines in the current siltation scenario were 60.06 % and 68.37 %, respectively, and the proportions of overflow nodes were 24.87 % and 25.89 %, respectively. When the drainage network was initially put into operation, the overload rates of the pipeline were 36.67 % and 41.16 %, and the overflow nodes accounted for 3.05 % and 4.06 %, respectively. The inundated area and volume of urban flooding increased when the combined siltation coefficient (CSC) was 0.2; therefore, two desilting schemes were determined. Under Rainfall 1, Rainfall 2, and the four rainfall recurrence periods, the Bayes-CNN-BLSTM model had clear advantages in terms of accuracy, reliability, and robustness.

7.
Neuroradiology ; 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39225815

RESUMEN

OBJECTIVE: Research into the effectiveness and applicability of deep learning, radiomics, and their integrated models based on Magnetic Resonance Imaging (MRI) for preoperative differentiation between Primary Central Nervous System Lymphoma (PCNSL) and Glioblastoma (GBM), along with an exploration of the interpretability of these models. MATERIALS AND METHODS: A retrospective analysis was performed on MRI images and clinical data from 261 patients across two medical centers. The data were split into a training set (n = 153, medical center 1) and an external test set (n = 108, medical center 2). Radiomic features were extracted using Pyradiomics to build the Radiomics Model. Deep learning networks, including the transformer-based MobileVIT Model and Convolutional Neural Networks (CNN) based ConvNeXt Model, were trained separately. By applying the "late fusion" theory, the radiomics model and deep learning model were fused to produce the optimal Max-Fusion Model. Additionally, Shapley Additive exPlanations (SHAP) and Grad-CAM were employed for interpretability analysis. RESULTS: In the external test set, the Radiomics Model achieved an Area under the receiver operating characteristic curve (AUC) of 0.86, the MobileVIT Model had an AUC of 0.91, the ConvNeXt Model demonstrated an AUC of 0.89, and the Max-Fusion Model showed an AUC of 0.92. The Delong test revealed a significant difference in AUC between the Max-Fusion Model and the Radiomics Model (P = 0.02). CONCLUSION: The Max-Fusion Model, combining different models, presents superior performance in distinguishing PCNSL and GBM, highlighting the effectiveness of model fusion for enhanced decision-making in medical applications. CLINICAL RELEVANCE STATEMENT: The preoperative non-invasive differentiation between PCNSL and GBM assists clinicians in selecting appropriate treatment regimens and clinical management strategies.

8.
Neural Netw ; 179: 106597, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39128275

RESUMEN

Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in various domains, such as face recognition, object detection, and image segmentation. However, the lack of transparency and limited interpretability inherent in CNNs pose challenges in fields such as medical diagnosis, autonomous driving, finance, and military applications. Several studies have explored the interpretability of CNNs and proposed various post-hoc interpretable methods. The majority of these methods are feature-based, focusing on the influence of input variables on outputs. Few methods undertake the analysis of parameters in CNNs and their overall structure. To explore the structure of CNNs and intuitively comprehend the role of their internal parameters, we propose an Attribution Graph-based Interpretable method for CNNs (AGIC) which models the overall structure of CNNs as graphs and provides interpretability from global and local perspectives. The runtime parameters of CNNs and feature maps of each image sample are applied to construct attribution graphs (At-GCs), where the convolutional kernels are represented as nodes and the SHAP values between kernel outputs are assigned as edges. These At-GCs are then employed to pretrain a newly designed heterogeneous graph encoder based on Deep Graph Infomax (DGI). To comprehensively delve into the overall structure of CNNs, the pretrained encoder is used for two types of interpretable tasks: (1) a classifier is attached to the pretrained encoder for the classification of At-GCs, revealing the dependency of At-GC's topological characteristics on the image sample categories, and (2) a scoring aggregation (SA) network is constructed to assess the importance of each node in At-GCs, thus reflecting the relative importance of kernels in CNNs. The experimental results indicate that the topological characteristics of At-GC exhibit a dependency on the sample category used in its construction, which reveals that kernels in CNNs show distinct combined activation patterns for processing different image categories, meanwhile, the kernels that receive high scores from SA network are crucial for feature extraction, whereas low-scoring kernels can be pruned without affecting model performance, thereby enhancing the interpretability of CNNs.


Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Aprendizaje Profundo
9.
Diagnostics (Basel) ; 14(16)2024 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-39202229

RESUMEN

BACKGROUND: Acute myocardial infarctions are deadly to patients and burdensome to healthcare systems. Most recorded infarctions are patients' first, occur out of the hospital, and often are not accompanied by cardiac comorbidities. The clinical manifestations of the underlying pathophysiology leading to an infarction are not fully understood and little effort exists to use explainable machine learning to learn predictive clinical phenotypes before hospitalization is needed. METHODS: We extracted outpatient electronic health record data for 2641 case and 5287 matched-control patients, all without pre-existing cardiac diagnoses, from the Michigan Medicine Health System. We compare six different interpretable, feature extraction approaches, including temporal computational phenotyping, and train seven interpretable machine learning models to predict the onset of first acute myocardial infarction within six months. RESULTS: Using temporal computational phenotypes significantly improved the model performance compared to alternative approaches. The mean cross-validation test set performance exhibited area under the receiver operating characteristic curve values as high as 0.674. The most consistently predictive phenotypes of a future infarction include back pain, cardiometabolic syndrome, family history of cardiovascular diseases, and high blood pressure. CONCLUSIONS: Computational phenotyping of longitudinal health records can improve classifier performance and identify predictive clinical concepts. State-of-the-art interpretable machine learning approaches can augment acute myocardial infarction risk assessment and prioritize potential risk factors for further investigation and validation.

10.
Sci Total Environ ; 951: 175733, 2024 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-39181249

RESUMEN

Relationships between toxic pollutant emissions during industrial processes and toxic pollutant dietary intakes and adverse health burdens have not yet been quantitatively clarified. Polychlorinated naphthalenes (PCNs) are typical industrial pollutants that are carcinogenic and of increasing concern. In this study, we established an interpretable machine learning model for quantifying the contributions of industrial emissions and dietary intakes of PCNs to health effects. We used the SHapley Additive exPlanations model to achieve individualized interpretability, enabling us to evaluate the specific contributions of individual feature values towards PCNs concentration levels. A strong relationship between PCN dietary intake and body burden was found using a robust large-scale PCN diet survey database for China containing the results of the analyses of 17,280 dietary samples and 4480 breast milk samples. Industrial emissions and dietary intake contributed 12 % and 52 %, respectively, of the PCN burden in breast milk. The model quantified the contributions of food consumption and industrial emissions to PCN exposure, which will be useful for performing accurate health risk assessments and developing reduction strategies of PCNs.


Asunto(s)
Exposición Dietética , Naftalenos , Humanos , Exposición Dietética/estadística & datos numéricos , Exposición Dietética/análisis , China , Naftalenos/análisis , Leche Humana/química , Exposición a Riesgos Ambientales/estadística & datos numéricos , Contaminantes Ambientales/análisis , Residuos Industriales/análisis , Medición de Riesgo
11.
Sci Total Environ ; 951: 175585, 2024 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-39155002

RESUMEN

This study explores the integration of crop phenology models and machine learning approaches for predicting rice phenology across China, to gain a deeper understanding of rice phenology prediction. Multiple approaches were used to predict heading and maturity dates at 337 locations across the main rice growing regions of China from 1981 to 2020, including crop phenology model, machine learning and hybrid model that integrate both approaches. Furthermore, an interpretable machine learning (IML) using SHapley Additive exPlanation (SHAP) was employed to elucidate influence of climatic and varietal factors on uncertainty in crop phenology model predictions. Overall, the hybrid model demonstrated a high accuracy in predicting rice phenology, followed by machine learning and crop phenology models. The best hybrid model, based on a serial structure and the eXtreme Gradient Boosting (XGBoost) algorithm, achieved a root mean square error (RMSE) of 4.65 and 5.72 days and coefficient of determination (R2) values of 0.93 and 0.9 for heading and maturity predictions, respectively. SHAP analysis revealed temperature to be the most influential climate variable affecting phenology predictions, particularly under extreme temperature conditions, while rainfall and solar radiation were found to be less influential. The analysis also highlighted the variable importance of climate across different phenological stages, rice cultivation patterns, and geographic regions, underscoring the notable regionality. The study proposed that a hybrid model using an IML approach would not only improve the accuracy of prediction but also offer a robust framework for leveraging data-driven in crop modeling, providing a valuable tool for refining and advancing the modeling process in rice.


Asunto(s)
Productos Agrícolas , Aprendizaje Automático , Oryza , China , Oryza/crecimiento & desarrollo , Productos Agrícolas/crecimiento & desarrollo , Clima , Estaciones del Año , Agricultura/métodos
12.
Artículo en Inglés | MEDLINE | ID: mdl-39178361

RESUMEN

OBJECTIVE: Conventional physical activity (PA) metrics derived from wearable sensors may not capture the cumulative, transitions from sedentary to active, and multidimensional patterns of PA, limiting the ability to predict physical function impairment (PFI) in older adults. This study aims to identify unique temporal patterns and develop novel digital biomarkers from wrist accelerometer data for predicting PFI and its subtypes using explainable artificial intelligence techniques. MATERIALS AND METHODS: Wrist accelerometer streaming data from 747 participants in the National Health and Aging Trends Study (NHATS) were used to calculate 231 PA features through time-series analysis techniques-Tsfresh. Predictive models for PFI and its subtypes (walking, balance, and extremity strength) were developed using 6 machine learning (ML) algorithms with hyperparameter optimization. The SHapley Additive exPlanations method was employed to interpret the ML models and rank the importance of input features. RESULTS: Temporal analysis revealed peak PA differences between PFI and healthy controls from 9:00 to 11:00 am. The best-performing model (Gradient boosting Tree) achieved an area under the curve score of 85.93%, accuracy of 81.52%, sensitivity of 77.03%, and specificity of 87.50% when combining wrist accelerometer streaming data (WAPAS) features with demographic data. DISCUSSION: The novel digital biomarkers, including change quantiles, Fourier transform (FFT) coefficients, and Aggregated (AGG) Linear Trend, outperformed traditional PA metrics in predicting PFI. These findings highlight the importance of capturing the multidimensional nature of PA patterns for PFI. CONCLUSION: This study investigates the potential of wrist accelerometer digital biomarkers in predicting PFI and its subtypes in older adults. Integrated PFI monitoring systems with digital biomarkers would improve the current state of remote PFI surveillance.

13.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39120644

RESUMEN

Recent advancements in spatial imaging technologies have revolutionized the acquisition of high-resolution multichannel images, gene expressions, and spatial locations at the single-cell level. Our study introduces xSiGra, an interpretable graph-based AI model, designed to elucidate interpretable features of identified spatial cell types, by harnessing multimodal features from spatial imaging technologies. By constructing a spatial cellular graph with immunohistology images and gene expression as node attributes, xSiGra employs hybrid graph transformer models to delineate spatial cell types. Additionally, xSiGra integrates a novel variant of gradient-weighted class activation mapping component to uncover interpretable features, including pivotal genes and cells for various cell types, thereby facilitating deeper biological insights from spatial data. Through rigorous benchmarking against existing methods, xSiGra demonstrates superior performance across diverse spatial imaging datasets. Application of xSiGra on a lung tumor slice unveils the importance score of cells, illustrating that cellular activity is not solely determined by itself but also impacted by neighboring cells. Moreover, leveraging the identified interpretable genes, xSiGra reveals endothelial cell subset interacting with tumor cells, indicating its heterogeneous underlying mechanisms within complex cellular interactions.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Algoritmos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Neoplasias Pulmonares/metabolismo , Biología Computacional/métodos
14.
Water Res ; 264: 122243, 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-39142046

RESUMEN

Bound extracellular polymeric substances (EPS) are complex, high-molecular-weight polymer mixtures that play a critical role in pore clogging, foulants adhesion, and fouling layer formation during membrane filtration, owing to their adhesive properties and gelation tendencies. In this study, a novel electrochemical anaerobic membrane bioreactor (EC-AnMBR) was constructed to investigate the effect of sludge bound-EPS solubilization on methane bioconversion and membrane fouling mitigation. During the 150-days' operation, the EC-AnMBR demonstrated remarkable performance, characterized by an exceptionally low fouling rate (transmembrane pressure (TMP) < 4.0 kPa) and high-quality effluent (COD removal > 98.2 %, protein removal > 97.7 %, and polysaccharide removal > 98.5 %). The highest methane productivity was up to 38.0 ± 3.1 mL/Lreactor/d at the applied voltage of 0.8 V with bound-EPS solubilization, 107.6 % higher than that of the control stage (18.3 ± 2.4 mL/Lreactor/d). Morphological and multiplex fluorescence labeling analyses revealed higher fluorescence intensities of proteins, polysaccharides, total cells and lipids on the surface of the fouling layer. In contrast, the interior exhibited increased compression density and reduced activity, likely attributable to compression effect. Under the synergistic influence of the electric field and bound-EPS solubilization, biomass characteristics exhibited a reduced propensity for membrane fouling. Furthermore, the bio-electrochemical regulation enhanced the electroactivity of microbial aggregates and enriched functional microorganisms, thereby promoting biofilm growth and direct interspecies electron transfer. Additionally, the potential hydrogenotrophic and methylotrophic methanogenesis pathways were enhanced at the cathode and anode surfaces, thereby increasing CH4 productivity. The random forest-based machine learning model analyzed the nonlinear contributions of EPS characteristics on methane productivity and TMP values, achieving R² values of 0.879 and 0.848, respectively. Shapley additive explanations (SHAP) analysis indicated that S-EPSPS and S-EPSPN were the most critical factors affecting CH4 productivity and membrane fouling, respectively. Partial dependence plot analysis further verified the marginal and interaction effects of different EPS layers on these outcomes. By combining continuous operation with interpretable machine learning algorithms, this study unveils the intricate impacts of EPS characteristics on methane productivity and membrane fouling behaviors, and provides new insights into sludge bound-EPS solubilization in EC-AnMBR.


Asunto(s)
Reactores Biológicos , Aprendizaje Automático , Membranas Artificiales , Metano , Aguas del Alcantarillado , Aguas del Alcantarillado/microbiología , Anaerobiosis , Incrustaciones Biológicas , Matriz Extracelular de Sustancias Poliméricas , Solubilidad , Eliminación de Residuos Líquidos/métodos
15.
Accid Anal Prev ; 207: 107740, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39142041

RESUMEN

The causes of traffic violations by elderly drivers are different from those of other age groups. To reduce serious traffic violations that are more likely to cause serious traffic crashes, this study divided the severity of traffic violations into three levels (i.e., slight, ordinary, severe) based on point deduction, and explore the patterns of serious traffic violations (i.e., ordinary, severe) using multi-source data. This paper designed an interpretable machine learning framework, in which four popular machine learning models were enhanced and compared. Specifically, adaptive synthetic sampling method was applied to overcome the effects of imbalanced data and improve the prediction accuracy of minority classes (i.e., ordinary, severe); multi-objective feature selection based on NSGA-II was used to remove the redundant factors to increase the computational efficiency and make the patterns discovered by the explainer more effective; Bayesian hyperparameter optimization aimed to obtain more effective hyperparameters combination with fewer iterations and boost the model adaptability. Results show that the proposed interpretable machine learning framework can significantly improve and distinguish the performance of four popular machine learning models and two post-hoc interpretation methods. It is found that six of the top ten important factors belong to multi-scale built environment attributes. By comparing the results of feature contribution and interaction effects, some findings can be summarized: ordinary and severe traffic violations have some identical influencing factors and interactive effects; have the same influencing factors or the same combinations of influencing factors, but the values of the factors are different; have some unique influencing factors and unique combinations of influencing factors.


Asunto(s)
Accidentes de Tránsito , Conducción de Automóvil , Teorema de Bayes , Entorno Construido , Aprendizaje Automático , Humanos , Anciano , Accidentes de Tránsito/prevención & control , Accidentes de Tránsito/estadística & datos numéricos , Conducción de Automóvil/legislación & jurisprudencia , Anciano de 80 o más Años
16.
Sci Rep ; 14(1): 17854, 2024 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-39090141

RESUMEN

Analyses of complex behaviors of Cerebrospinal Fluid (CSF) have become increasingly important in diseases diagnosis. The changes of the phase-contrast magnetic resonance imaging (PC-MRI) signal formed by the velocity of flowing CSF are represented as a set of velocity-encoded images or maps, which can be thought of as signal data in the context of medical imaging, enabling the evaluation of pulsatile patterns throughout a cardiac cycle. However, automatic segmentation of the CSF region in a PC-MRI image is challenging, and implementing an explained ML method using pulsatile data as a feature remains unexplored. This paper presents lightweight machine learning (ML) algorithms to perform CSF lumen segmentation in spinal, utilizing sets of velocity-encoded images or maps as a feature. The Dataset contains 57 PC-MRI slabs by 3T MRI scanner from control and idiopathic scoliosis participants are involved to collect data. The ML models are trained with 2176 time series images. Different cardiac periods image (frame) numbers of PC-MRIs are interpolated in the preprocessing step to align to features of equal size. The fivefold cross-validation procedure is used to estimate the success of the ML models. Additionally, the study focusses on enhancing the interpretability of the highest-accuracy eXtreme gradient boosting (XGB) model by applying the shapley additive explanations (SHAP) technique. The XGB algorithm presented its highest accuracy, with an average fivefold accuracy of 0.99% precision, 0.95% recall, and 0.97% F1 score. We evaluated the significance of each pulsatile feature's contribution to predictions, offering a more profound understanding of the model's behavior in distinguishing CSF lumen pixels with SHAP. Introducing a novel approach in the field, develop ML models offer comprehension into feature extraction and selection from PC-MRI pulsatile data. Moreover, the explained ML model offers novel and valuable insights to domain experts, contributing to an enhanced scholarly understanding of CSF dynamics.


Asunto(s)
Líquido Cefalorraquídeo , Aprendizaje Automático , Imagen por Resonancia Magnética , Flujo Pulsátil , Humanos , Imagen por Resonancia Magnética/métodos , Algoritmos , Escoliosis/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Femenino , Masculino
17.
Fundam Res ; 4(4): 738-751, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39156565

RESUMEN

Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity. The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic models of childhood asthma. To capture the nonlinear association of multi-omics data and improve interpretability of diagnostic model, we proposed a novel deep association model (DAM) and corresponding efficient analysis framework. First, the Deep Subspace Reconstruction was used to fuse the omics data and diagnostic information, thereby correcting the distribution of the original omics data and reducing the influence of unnecessary data noises. Second, the Joint Deep Semi-Negative Matrix Factorization was applied to identify different latent sample patterns and extract biomarkers from different omics data levels. Third, our newly proposed Deep Orthogonal Canonical Correlation Analysis can rank features in the collaborative module, which are able to construct the diagnostic model considering nonlinear correlation between different omics data levels. Using DAM, we deeply analyzed the transcriptome and methylation data of childhood asthma. The effectiveness of DAM is verified from the perspectives of algorithm performance and biological significance on the independent test dataset, by ablation experiment and comparison with many baseline methods from clinical and biological studies. The DAM-induced diagnostic model can achieve a prediction AUC of 0.912, which is higher than that of many other alternative methods. Meanwhile, relevant pathways and biomarkers of childhood asthma are also recognized to be collectively altered on the gene expression and methylation levels. As an interpretable machine learning approach, DAM simultaneously considers the non-linear associations among samples and those among biological features, which should help explore interpretative biomarker candidates and efficient diagnostic models from multi-omics data analysis for human complex diseases.

18.
Front Oral Health ; 5: 1408867, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39092200

RESUMEN

Oral diseases pose a significant burden on global healthcare. While many oral conditions are preventable and manageable through regular dental office visits, a substantial portion of the population faces obstacles in accessing essential and affordable quality oral healthcare. In this mini review, we describe the issue of inequity and bias in oral healthcare and discuss various strategies to address these challenges, with an emphasis on the application of artificial intelligence (AI). Recent advances in AI technologies have led to significant performance improvements in oral healthcare. AI also holds tremendous potential for advancing equity in oral healthcare, yet its application must be approached with caution to prevent the exacerbation of inequities. The "black box" approaches of some advanced AI models raise uncertainty about their operations and decision-making processes. To this end, we discuss the use of interpretable and explainable AI techniques in enhancing transparency and trustworthiness. Those techniques, aimed at augmenting rather than replacing oral health practitioners' judgment and skills, have the potential to achieve personalized dental and oral care that is unbiased, equitable, and transparent. Overall, achieving equity in oral healthcare through the responsible use of AI requires collective efforts from all stakeholders involved in the design, implementation, regulation, and utilization of AI systems. We use the United States as an example due to its uniquely diverse population, making it an excellent model for our discussion. However, the general and responsible AI strategies suggested in this article can be applied to address equity in oral healthcare on a global level.

19.
Front Psychol ; 15: 1392240, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39118849

RESUMEN

Background: Depression is one of the most common mental illnesses among middle-aged and older adults in China. It is of great importance to find the crucial factors that lead to depression and to effectively control and reduce the risk of depression. Currently, there are limited methods available to accurately predict the risk of depression and identify the crucial factors that influence it. Methods: We collected data from 25,586 samples from the harmonized China Health and Retirement Longitudinal Study (CHARLS), and the latest records from 2018 were included in the current cross-sectional analysis. Ninety-three input variables in the survey were considered as potential influential features. Five machine learning (ML) models were utilized, including CatBoost and eXtreme Gradient Boosting (XGBoost), Gradient Boosting decision tree (GBDT), Random Forest (RF), Light Gradient Boosting Machine (LightGBM). The models were compared to the traditional multivariable Linear Regression (LR) model. Simultaneously, SHapley Additive exPlanations (SHAP) were used to identify key influencing factors at the global level and explain individual heterogeneity through instance-level analysis. To explore how different factors are non-linearly associated with the risk of depression, we employed the Accumulated Local Effects (ALE) approach to analyze the identified critical variables while controlling other covariates. Results: CatBoost outperformed other machine learning models in terms of MAE, MSE, MedAE, and R2metrics. The top three crucial factors identified by the SHAP were r4satlife, r4slfmem, and r4shlta, representing life satisfaction, self-reported memory, and health status levels, respectively. Conclusion: This study demonstrates that the CatBoost model is an appropriate choice for predicting depression among middle-aged and older adults in Harmonized CHARLS. The SHAP and ALE interpretable methods have identified crucial factors and the nonlinear relationship with depression, which require the attention of domain experts.

20.
Hellenic J Cardiol ; 2024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39128707

RESUMEN

OBJECTIVE: This study aimed to leverage real-world electronic medical record (EMR) data to develop interpretable machine learning models for diagnosis of Kawasaki disease, while also exploring and prioritizing the significant risk factors. METHODS: A comprehensive study was conducted on 4,087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using SPSS 26.0. The optimal feature subset was employed to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine (LGBM), Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums (FIGS), Decision Tree (DT), AdaBoost Classifier (AdaBoost), and Logistic Regression (LR). Model performance was evaluated in three dimensions: discriminative ability via Receiver Operating Characteristic curves, calibration accuracy using calibration curves, and interpretability through Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). RESULTS: In this study, Kawasaki disease was diagnosed in 2,971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance compared to other models, with an Area Under the Curve (AUC) of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools like SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease. CONCLUSIONS: This study employed diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models, like EBM, outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA