RESUMEN
Type 2 Diabetes (T2D) is a prevalent lifelong health condition. It is predicted that over 500 million adults will be diagnosed with T2D by 2040. T2D can develop at any age, and if it progresses, it may cause serious comorbidities. One of the most critical T2D-related comorbidities is Myocardial Infarction (MI), known as heart attack. MI is a life-threatening medical emergency, and it is important to predict it and intervene in a timely manner. The use of Machine Learning (ML) for clinical prediction is gaining pace, but the class imbalance in predictive models is a key challenge for establishing a trustworthy deployment of the technology. This may lead to bias and overfitting in the ML models, and it may cause misleading interpretations of the ML outputs. In our study, we showed how systematic use of Class Imbalance Handling (CIH) techniques may improve the performance of the ML models. We used the Connected Bradford dataset, consisting of over one million real-world health records. Three commonly used CIH techniques, Oversampling, Undersampling, and Class Weighting (CW) have been used for Naive Bayes (NB), Neural Network (NN), Random Forest (RF), Support Vector Machine (SVM), and Ensemble models. We report that CW overperforms among the other techniques with the highest Accuracy and F1 values of 0.9948 and 0.9556, respectively. Applying the most appropriate CIH techniques for the ML models using real-world healthcare data provides promising results for helping to reduce the risk of MI in patients with T2D.
Asunto(s)
Diabetes Mellitus Tipo 2 , Aprendizaje Automático , Infarto del Miocardio , Humanos , Teorema de Bayes , Máquina de Vectores de SoporteRESUMEN
Type 2 diabetes is a life-long health condition, and as it progresses, A range of comorbidities can develop. The prevalence of diabetes has increased gradually, and it is expected that 642 million adults will be living with diabetes by 2040. Early and proper interventions for managing diabetes-related comorbidities are important. In this study, we propose a Machine Learning (ML) model for predicting the risk of developing hypertension for patients who already have Type 2 diabetes. We used the Connected Bradford dataset, consisting of 1.4 million patients, as our main dataset for data analysis and model building. As a result of data analysis, we found that hypertension is the most frequent observation among patients having Type 2 diabetes. Since hypertension is very important to predict clinically poor outcomes such as risk of heart, brain, kidney, and other diseases, it is crucial to make early and accurate predictions of the risk of having hypertension for Type 2 diabetic patients. We used Naïve Bayes (NB), Neural Network (NN), Random Forest (RF), and Support Vector Machine (SVM) to train our model. Then we ensembled these models to see the potential performance improvement. The ensemble method gave the best classification performance values of accuracy and kappa values of 0.9525 and 0.2183, respectively. We concluded that predicting the risk of developing hypertension for Type 2 diabetic patients using ML provides a promising stepping stone for preventing the Type 2 diabetes progression.