Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Anaesth Crit Care Pain Med ; : 101424, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39278548

RESUMEN

BACKGROUND: Postoperative pulmonary complications (PPCs) contribute to high mortality rates and impose significant financial burdens. In this study, a machine learning-based prediction model was developed to identify patients at high risk of developing PPCs following laparoscopic hepatectomy. METHODS: Data were collected from 1022 adult patients who underwent laparoscopic hepatectomy at two centres between January 2015 and February 2022. The dataset was divided into a development set and a temporal external validation set based on the year of surgery. A total of 42 factors were extracted for pre-modelling, including the implementation status of Enhanced Recovery after Surgery (ERAS). Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) method. Model performance was assessed using the area under the receiver operating characteristic curve (AUC). The model with the best performance was externally validated using temporal data. RESULTS: The incidence of PPCs was 8.7%. Lambda.1se was selected as the optimal lambda for LASSO feature selection. For implementation of ERAS, serum gamma-glutamyl transferase levels, malignant tumour presence, total bilirubin levels, and age-adjusted Charleston Comorbidities Index were the selected factors. Seven models were developed. Among them, logistic regression demonstrated the best performance, with an AUC of 0.745 in the internal validation set and 0.680 in the temporal external validation set. CONCLUSIONS: Based on the most recent definition, a machine learning model was employed to predict the risk of PPCs following laparoscopic hepatectomy. Logistic regression was identified as the best-performing model. ERAS implementation was associated with a reduction in the number of PPCs.

2.
Fundam Res ; 4(4): 752-760, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39156563

RESUMEN

The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest. Although widely applied, traditional polygenic risk scoring methods fall short, as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms (SNPs). This presents a limitation, as genetic diseases often arise from complex interactions between multiple SNPs. To address this challenge, we developed DeepRisk, a biological knowledge-driven deep learning method for modeling these complex, nonlinear associations among SNPs, to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data. Evaluations demonstrated that DeepRisk outperforms existing PRS-based methods in identifying individuals at high risk for four common diseases: Alzheimer's disease, inflammatory bowel disease, type 2 diabetes, and breast cancer.

3.
Front Genet ; 15: 1409755, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38993480

RESUMEN

This research aims to advance the detection of Chronic Kidney Disease (CKD) through a novel gene-based predictive model, leveraging recent breakthroughs in gene sequencing. We sourced and merged gene expression profiles of CKD-affected renal tissues from the Gene Expression Omnibus (GEO) database, classifying them into two sets for training and validation in a 7:3 ratio. The training set included 141 CKD and 33 non-CKD specimens, while the validation set had 60 and 14, respectively. The disease risk prediction model was constructed using the training dataset, while the validation dataset confirmed the model's identification capabilities. The development of our predictive model began with evaluating differentially expressed genes (DEGs) between the two groups. We isolated six genes using Lasso and random forest (RF) methods-DUSP1, GADD45B, IFI44L, IFI30, ATF3, and LYZ-which are critical in differentiating CKD from non-CKD tissues. We refined our random forest (RF) model through 10-fold cross-validation, repeated five times, to optimize the mtry parameter. The performance of our model was robust, with an average AUC of 0.979 across the folds, translating to a 91.18% accuracy. Validation tests further confirmed its efficacy, with a 94.59% accuracy and an AUC of 0.990. External validation using dataset GSE180394 yielded an AUC of 0.913, 89.83% accuracy, and a sensitivity rate of 0.889, underscoring the model's reliability. In summary, the study identified critical genetic biomarkers and successfully developed a novel disease risk prediction model for CKD. This model can serve as a valuable tool for CKD disease risk assessment and contribute significantly to CKD identification.

4.
Comput Biol Med ; 178: 108763, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38889629

RESUMEN

The current disease risk prediction model with many parameters is complex to run smoothly on mobile terminals such as tablets and mobile phones in imaginative elderly care application scenarios. In order to further reduce the number of parameters in the model and enable the disease risk prediction model to run smoothly on mobile terminals, we designed a model called Motico (An Attention Mechanism Network Model for Image Data Classification). During the implementation of the Motico model, in order to protect image features, we designed an image data preprocessing method and an attention mechanism network model for image data classification. The Motico model parameter size is only 5.26 MB, and the memory only takes up 135.69 MB. In the experiment, the accuracy of disease risk prediction was 96 %, the precision rate was 97 %, the recall rate was 93 %, the specificity was 98 %, the F1 score was 95 %, and the AUC was 95 %. This experimental result shows that our Motico model can implement classification prediction based on the image data classification attention mechanism network on mobile terminals.


Asunto(s)
Envejecimiento , Humanos , Anciano , Envejecimiento/fisiología , Femenino , Procesamiento de Imagen Asistido por Computador/métodos , Masculino , Anciano de 80 o más Años
5.
BMC Med Inform Decis Mak ; 24(1): 178, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38915008

RESUMEN

OBJECTIVE: This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML). MATERIALS AND METHODS: The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the weights of these indicators were determined using the Analytic Hierarchy Process (AHP) method; and (4) the developed index system was empirically validated using real-world EMR data in a ML-based disease risk prediction task. RESULTS: The synthesis of review findings and the expert consultations led to the formulation of a three-level index system with four first-level, 11 second-level, and 33 third-level indicators. The weights of these indicators were obtained through the AHP method. Results from the empirical analysis illustrated a positive relationship between the scores assigned by the proposed index system and the predictive performances of the datasets. DISCUSSION: The proposed index system for evaluating EMR data quality is grounded in extensive literature analysis and expert consultation. Moreover, the system's high reliability and suitability has been affirmed through empirical validation. CONCLUSION: The novel index system offers a robust framework for assessing the quality and suitability of EMR data in ML-based disease risk predictions. It can serve as a guide in building EMR databases, improving EMR data quality control, and generating reliable real-world evidence.


Asunto(s)
Exactitud de los Datos , Registros Electrónicos de Salud , Aprendizaje Automático , Registros Electrónicos de Salud/normas , Humanos , Medición de Riesgo/normas , Técnica Delphi
6.
BMC Bioinformatics ; 25(1): 56, 2024 Feb 02.
Artículo en Inglés | MEDLINE | ID: mdl-38308205

RESUMEN

BACKGROUND: Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS: First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS: Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.


Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Humanos , Teorema de Bayes , Aprendizaje Automático , República de Corea/epidemiología
7.
Stud Health Technol Inform ; 310: 1021-1025, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269969

RESUMEN

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.


Asunto(s)
Enfermedad de la Arteria Coronaria , Humanos , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Enfermedad de la Arteria Coronaria/genética , Grosor Intima-Media Carotídeo , Fenotipo , Genotipo , Aprendizaje Automático
8.
J Pers Med ; 13(7)2023 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-37511683

RESUMEN

Transformer is the latest deep neural network (DNN) architecture for sequence data learning, which has revolutionized the field of natural language processing. This success has motivated researchers to explore its application in the healthcare domain. Despite the similarities between longitudinal clinical data and natural language data, clinical data presents unique complexities that make adapting Transformer to this domain challenging. To address this issue, we have designed a new Transformer-based DNN architecture, referred to as Hybrid Value-Aware Transformer (HVAT), which can jointly learn from longitudinal and non-longitudinal clinical data. HVAT is unique in the ability to learn from the numerical values associated with clinical codes/concepts such as labs, and in the use of a flexible longitudinal data representation called clinical tokens. We have also trained a prototype HVAT model on a case-control dataset, achieving high performance in predicting Alzheimer's disease and related dementias as the patient outcome. The results demonstrate the potential of HVAT for broader clinical data-learning tasks.

9.
Neural Netw ; 165: 562-595, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37364469

RESUMEN

Data visualization is critical to unraveling hidden information from complex and high-dimensional data. Interpretable visualization methods are critical, especially in the biology and medical fields, however, there are limited effective visualization methods for large genetic data. Current visualization methods are limited to lower-dimensional data and their performance suffers if there is missing data. In this study, we propose a literature-based visualization method to reduce high-dimensional data without compromising the dynamics of the single nucleotide polymorphisms (SNP) and textual interpretability. Our method is innovative because it is shown to (1) preserves both global and local structures of SNP while reducing the dimension of the data using literature text representations, and (2) enables interpretable visualizations using textual information. For performance evaluations, we examined the proposed approach to classify various classification categories including race, myocardial infarction event age groups, and sex using several machine learning models on the literature-derived SNP data. We used visualization approaches to examine clustering of data as well as quantitative performance metrics for the classification of the risk factors examined above. Our method outperformed all popular dimensionality reduction and visualization methods for both classification and visualization, and it is robust against missing and higher-dimensional data. Moreover, we found it feasible to incorporate both genetic and other risk information obtained from literature with our method.


Asunto(s)
Visualización de Datos , Infarto del Miocardio , Humanos , Redes Neurales de la Computación , Aprendizaje Automático , Infarto del Miocardio/diagnóstico por imagen , Infarto del Miocardio/genética
10.
Eur Heart J Qual Care Clin Outcomes ; 9(4): 310-322, 2023 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-36869800

RESUMEN

BACKGROUND: Cardiovascular disease (CVD) risk prediction is important for guiding the intensity of therapy in CVD prevention. Whilst current risk prediction algorithms use traditional statistical approaches, machine learning (ML) presents an alternative method that may improve risk prediction accuracy. This systematic review and meta-analysis aimed to investigate whether ML algorithms demonstrate greater performance compared with traditional risk scores in CVD risk prognostication. METHODS AND RESULTS: MEDLINE, EMBASE, CENTRAL, and SCOPUS Web of Science Core collections were searched for studies comparing ML models to traditional risk scores for CVD risk prediction between the years 2000 and 2021. We included studies that assessed both ML and traditional risk scores in adult (≥18 year old) primary prevention populations. We assessed the risk of bias using the Prediction Model Risk of Bias Assessment Tool (PROBAST) tool. Only studies that provided a measure of discrimination [i.e. C-statistics with 95% confidence intervals (CIs)] were included in the meta-analysis. A total of 16 studies were included in the review and meta-analysis (3302 515 individuals). All study designs were retrospective cohort studies. Out of 16 studies, 3 externally validated their models, and 11 reported calibration metrics. A total of 11 studies demonstrated a high risk of bias. The summary C-statistics (95% CI) of the top-performing ML models and traditional risk scores were 0.773 (95% CI: 0.740-0.806) and 0.759 (95% CI: 0.726-0.792), respectively. The difference in C-statistic was 0.0139 (95% CI: 0.0139-0.140), P < 0.0001. CONCLUSION: ML models outperformed traditional risk scores in the discrimination of CVD risk prognostication. Integration of ML algorithms into electronic healthcare systems in primary care could improve identification of patients at high risk of subsequent CVD events and hence increase opportunities for CVD prevention. It is uncertain whether they can be implemented in clinical settings. Future implementation research is needed to examine how ML models may be utilized for primary prevention.This review was registered with PROSPERO (CRD42020220811).


Asunto(s)
Enfermedades Cardiovasculares , Adulto , Humanos , Adolescente , Enfermedades Cardiovasculares/prevención & control , Factores de Riesgo , Estudios Retrospectivos , Factores de Riesgo de Enfermedad Cardiaca , Aprendizaje Automático , Prevención Primaria/métodos
11.
Comput Methods Programs Biomed ; 230: 107340, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36640604

RESUMEN

BACKGROUND AND OBJECTIVE: Since the early symptoms of chronic obstructive pulmonary disease (COPD) are not obvious, patients are not easily identified, causing improper time for prevention and treatment. In present study, machine learning (ML) methods were employed to construct a risk prediction model for COPD to improve its prediction efficiency. METHODS: We collected data from a sample of 5807 cases with a complete COPD diagnosis from the 2019 COPD Surveillance Program in Shanxi Province and extracted 34 potentially relevant variables from the dataset. Firstly, we used feature selection methods (i.e., Generalized elastic net, Lasso and Adaptive lasso) to select ten variables. Afterwards, we employed supervised classifiers for class imbalanced data by combining the cost-sensitive learning and SMOTE resampling methods with the ML methods (Logistic Regression, SVM, Random Forest, XGBoost, LightGBM, NGBoost and Stacking), respectively. Last, we assessed their performance. RESULTS: The cough frequently at age 14 and before and other 9 variables are significant parameters for COPD. The Stacking heterogeneous ensemble model showed relatively good performance in the unbalanced datasets. The Logistic Regression with class weighting enjoyed the best classification performance in the balancing data when these composite indicators (AUC, F1-Score and G-mean) were used as criteria for model comparison. The values of F1-Score and G-mean for the top three ML models were 0.290/0.660 for Logistic Regression with class weighting, 0.288/0.649 for Stacking with synthetic minority oversampling technique (SMOTE), and 0.285/0.648 for LightGBM with SMOTE. CONCLUSIONS: This paper combining feature selection methods, unbalanced data processing methods and machine learning methods with data from disease surveillance questionnaires and physical measurements to identify people at risk of COPD, concluded that machine learning models based on survey questionnaires could provide an automated identification for patients at risk of COPD, and provide a simple and scientific aid for early identification of COPD.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Humanos , Adolescente , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/epidemiología , Aprendizaje Automático , Modelos Logísticos , Máquina de Vectores de Soporte
12.
Front Immunol ; 13: 1025688, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36405750

RESUMEN

Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole blood samples were collected from the Gene Expression Omnibus (GEO) database. After the datasets were merged, they were divided into training and validation datasets in the ratio of 7:3, where the SLE samples and healthy samples of the training dataset were 334 and 71, respectively, and the SLE samples and healthy samples of the validation dataset were 143 and 30, respectively. The training dataset was used to build the disease risk prediction model, and the validation dataset was used to verify the model identification ability. We first analyzed differentially expressed genes (DEGs) and then used Lasso and random forest (RF) to screen out six key genes (OAS3, USP18, RTP4, SPATS2L, IFI27 and OAS1), which are essential to distinguish SLE from healthy samples. With six key genes incorporated and five iterations of 10-fold cross-validation performed into the RF model, we finally determined the RF model with optimal mtry. The mean values of area under the curve (AUC) and accuracy of the models were over 0.95. The validation dataset was then used to evaluate the AUC performance and our model had an AUC of 0.948. An external validation dataset (GSE99967) with an AUC of 0.810, an accuracy of 0.836, and a sensitivity of 0.921 was used to assess the model's performance. The external validation dataset (GSE185047) of all SLE patients yielded an SLE sensitivity of up to 0.954. The final high-throughput RF model had a mean value of AUC over 0.9, again showing good results. In conclusion, we identified key genetic biomarkers and successfully developed a novel disease risk prediction model for SLE that can be used as a new SLE disease risk prediction aid and contribute to the identification of SLE.


Asunto(s)
Enfermedades Autoinmunes , Lupus Eritematoso Sistémico , Humanos , Área Bajo la Curva , Ubiquitina Tiolesterasa
13.
Front Bioinform ; 2: 927312, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36304293

RESUMEN

Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.

14.
Front Endocrinol (Lausanne) ; 13: 935796, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35937821

RESUMEN

Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease. Clinical features are traditionally used to predict DKD, yet with low diagnostic efficacy. Most of the recent biomarkers used to predict DKD are based on transcriptomics and metabolomics; however, they also should be used in combination with many other predictive indicators. The purpose of this study was thus to identify a simplified class of blood biomarkers capable of predicting the risk of developing DKD. The Gene Expression Omnibus database was screened for DKD biomarkers, and differentially expressed genes (DEGs) in human blood and kidney were identified via gene expression analysis and the Least Absolute Shrinkage and Selection Operator regression. A comparison of the area under the curve (AUC) profiles on multiple receiver operating characteristic curves of the DEGs in DKD and other renal diseases revealed that REG1A and RUNX3 had the highest specificity for DKD diagnosis. The AUCs of the combined expression of REG1A and RUNX3 in kidney (AUC = 0.929) and blood samples (AUC = 0.917) of DKD patients were similar to each other. The AUC of blood samples from DKD patients and healthy individuals obtained for external validation further demonstrated that REG1A combined with RUNX3 had significant diagnostic efficacy (AUC=0.948). REG1A and RUNX3 expression levels were found to be positively and negatively correlated with urinary albumin creatinine ratio and estimated glomerular filtration rate, respectively. Kaplan-Meier curves also revealed the potential of REG1A and RUNX3 for predicting the risk of DKD. In conclusion, REG1A and RUNX3 may serve as biomarkers for predicting the risk of developing DKD.


Asunto(s)
Diabetes Mellitus Tipo 2 , Nefropatías Diabéticas , Biomarcadores/metabolismo , Subunidad alfa 3 del Factor de Unión al Sitio Principal , Nefropatías Diabéticas/etiología , Nefropatías Diabéticas/genética , Tasa de Filtración Glomerular , Humanos , Litostatina , Factores de Riesgo
15.
Digit Health ; 8: 20552076221089092, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35371534

RESUMEN

Objective: Ubiquitous internet access is reshaping the way we live, but it is accompanied by unprecedented challenges in preventing chronic diseases that are usually planted by long exposure to unhealthy lifestyles. This paper proposes leveraging online shopping behaviors as a proxy for personal lifestyle choices to improve chronic disease prevention literacy, targeted for times when e-commerce user experience has been assimilated into most people's everyday lives. Methods: Longitudinal query logs and purchase records from 15 million online shoppers were accessed, constructing a broad spectrum of lifestyle features covering various product categories and buyer personas. Using the lifestyle-related information preceding online shoppers' first purchases of specific prescription drugs, we could determine associations between their past lifestyle choices and whether they suffered from a particular chronic disease. Results: Novel lifestyle risk factors were discovered in two exemplars-depression and type 2 diabetes, most of which showed reasonable consistency with existing healthcare knowledge. Further, such empirical findings could be adopted to locate online shoppers at higher risk of these chronic diseases with decent accuracy [i.e. (area under the receiver operating characteristic curve) AUC=0.68 for depression and AUC=0.70 for type 2 diabetes], closely matching the performance of screening surveys benchmarked against medical diagnosis. Conclusions: Mining online shopping behaviors can point medical experts to a series of lifestyle issues associated with chronic diseases that are less explored to date. Hopefully, unobtrusive chronic disease surveillance via e-commerce sites can grant consenting individuals a privilege to be connected more readily with the medical profession and sophistication.

16.
Curr Med Res Opin ; 38(7): 1219-1228, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35410562

RESUMEN

BACKGROUND: The role of personalized treatment approaches, including those based on genetic testing, are increasingly enabling informed decision-making to improve health outcomes. Research involving Indigenous Australians has been lagging behind, although this population experiences a higher prevalence of chronic disease and mental health disorders. METHODS: Using community-based participatory research principles, this study purposefully interviewed participants with a diagnosed common mental disorder and a comorbid chronic disease condition. This was an inductive thematic analysis on semi-structured interviews with consenting participants (n = 48). Common themes and analytical domains were identified that provided a semantic understanding shared by participants. RESULTS: Five emerging themes were identified, primarily focusing on: (1) The perceptions and understanding of genetics research; (2) culturally appropriate conduct of genetics research; (3) the role of indigenous-led genetics research; (4) future prospects of genetics research; and (5) the importance of genetics research for patients with mental and physical health comorbidities. CONCLUSION: Indigenous Australians are under-represented in pharmacogenomics research despite well-documented epidemiological research demonstrating that Indigenous people globally experience greater risk of developing certain chronic diseases and more severe disease progression. Positive outcomes from this study highlight the importance of not only involving Indigenous participants, but providing leadership and governance opportunities for future genetics research.


Asunto(s)
Trastornos Mentales , Nativos de Hawái y Otras Islas del Pacífico , Australia/epidemiología , Enfermedad Crónica , Comorbilidad , Humanos , Trastornos Mentales/epidemiología , Trastornos Mentales/genética , Nativos de Hawái y Otras Islas del Pacífico/genética
17.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35348583

RESUMEN

Predicting disease progression in the initial stage to implement early intervention and treatment can effectively prevent the further deterioration of the condition. Traditional methods for medical data analysis usually fail to perform well because of their incapability for mining the correlation pattern of pathogenies. Therefore, many calculation methods have been excavated from the field of deep learning. In this study, we propose a novel method of influence hypergraph convolutional generative adversarial network (IHGC-GAN) for disease risk prediction. First, a hypergraph is constructed with genes and brain regions as nodes. Then, an influence transmission model is built to portray the associations between nodes and the transmission rule of disease information. Third, an IHGC-GAN method is constructed based on this model. This method innovatively combines the graph convolutional network (GCN) and GAN. The GCN is used as the generator in GAN to spread and update the lesion information of nodes in the brain region-gene hypergraph. Finally, the prediction accuracy of the method is improved by the mutual competition and repeated iteration between generator and discriminator. This method can not only capture the evolutionary pattern from early mild cognitive impairment (EMCI) to late MCI (LMCI) but also extract the pathogenic factors and predict the deterioration risk from EMCI to LMCI. The results on the two datasets indicate that the IHGC-GAN method has better prediction performance than the advanced methods in a variety of indicators.


Asunto(s)
Disfunción Cognitiva , Encéfalo , Disfunción Cognitiva/genética , Diagnóstico por Imagen , Progresión de la Enfermedad , Humanos
18.
Front Genet ; 13: 831866, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35211161

RESUMEN

Epidemiological and associative research from humans and animals identifies correlations between the environment and health impacts. The environment-health inter-relationship is effected through an individual's underlying genetic variation and mediated by mechanisms that include the changes to gene regulation that are associated with the diversity of phenotypes we exhibit. However, the causal relationships have yet to be established, in part because the associations are reduced to individual interactions and the combinatorial effects are rarely studied. This problem is exacerbated by the fact that our genomes are highly dynamic; they integrate information across multiple levels (from linear sequence, to structural organisation, to temporal variation) each of which is open to and responds to environmental influence. To unravel the complexities of the genomic basis of human disease, and in particular non-communicable diseases that are also influenced by the environment (e.g., obesity, type II diabetes, cancer, multiple sclerosis, some neurodegenerative diseases, inflammatory bowel disease, rheumatoid arthritis) it is imperative that we fully integrate multiple layers of genomic data. Here we review current progress in integrated genomic data analysis, and discuss cases where data integration would lead to significant advances in our ability to predict how the environment may impact on our health. We also outline limitations which should form the basis of future research questions. In so doing, this review will lay the foundations for future research into the impact of the environment on our health.

19.
BMC Med Inform Decis Mak ; 21(Suppl 9): 375, 2022 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-35016654

RESUMEN

BACKGROUND: Based on more than 15 million follow-up records of 404,426 patients from Guangdong Mental Health Center over the past 10 years, this study aims to propose a disease risk analysis and prediction model to support chronic disease management and clinical research for schizophrenia patients. METHODS: Based on a mental health information and intelligent data processing platform, we design an automatic AHP framework called AutoAHP to analyze and predict the disease risks of schizophrenia patients. Through automatic extraction, transformation and integration of follow-up data in the real world such as demography, treatment, and the disease course, a chronic database of patient status is established. In combination with age-period-cohort, logistic regression and Cox models, we apply the AutoAHP to assess disease risk and implement risk prediction in practice. RESULTS: A list of essential factors for risk prediction are identified, including annual changes in mental health policy, public support, regional difference, patient gender, compliance, and social function. After the verification of 1,222,038 complete disease course and treatment records of 256,050 patients, the AutoAHP framework achieves a precision of 0.923, a recall of 0.924, and a F1 of 0.923. The model is demonstrated to be superior to general models and has better performance in risk prediction. CONCLUSIONS: Aiming at the risk assessment of patients with schizophrenia which is influenced by factors, such as time, region and complication, the AutoAHP framework is able to be applied as a model in combination with logistic regression and Cox models to support clinical analysis of disease risk related factors and assist decision-making in chronic disease management.


Asunto(s)
Esquizofrenia , Humanos , Modelos Logísticos , Salud Mental , Medición de Riesgo , Esquizofrenia/diagnóstico , Esquizofrenia/epidemiología , Esquizofrenia/terapia
20.
Inform Health Soc Care ; 47(3): 243-257, 2022 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-34672859

RESUMEN

Type 2 diabetes is a chronic, costly disease and is a serious global population health problem. Yet, the disease is well manageable and preventable if there is an early warning. This study aims to apply supervised machine learning algorithms for developing predictive models for type 2 diabetes using administrative claim data. Following guidelines from the Elixhauser Comorbidity Index, 31 variables were considered. Five supervised machine learning algorithms were used for developing type 2 diabetes prediction models. Principal component analysis was applied to rank variables' importance in predictive models. Random forest (RF) showed the highest accuracy (85.06%) among the algorithms, closely followed by the k-nearest neighbor (84.48%). The analysis further revealed RF as a high performing algorithm irrespective of data imbalance. As revealed by the principal component analysis, patient age is the most important predictor for type 2 diabetes, followed by a comorbid condition (i.e., solid tumor without metastasis). This study's finding of RF as the best performing classifier is consistent with the promise of tree-based algorithms for public data in other works. Thus, the outcome can guide in designing automated surveillance of patients at risk of forming diabetes from administrative claim information and will be useful to health regulators and insurers.


Asunto(s)
Diabetes Mellitus Tipo 2 , Aprendizaje Automático , Algoritmos , Análisis por Conglomerados , Diabetes Mellitus Tipo 2/epidemiología , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA