Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Front Med (Lausanne) ; 11: 1431578, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39086944

RESUMEN

Although methods in diagnosis and therapy of hepatocellular carcinoma (HCC) have made significant progress in the past decades, the overall survival (OS) of liver cancer is still disappointing. Machine learning models have several advantages over traditional cox models in prognostic prediction. This study aimed at designing an optimal panel and constructing an optimal machine learning model in predicting prognosis for HCC. A total of 941 HCC patients with completed survival data and preoperative clinical chemistry and immunology indicators from two medical centers were included. The OCC panel was designed by univariate and multivariate cox regression analysis. Subsequently, cox model and machine-learning models were established and assessed for predicting OS and PFS in discovery cohort and internal validation cohort. The best OCC model was validated in the external validation cohort and analyzed in different subgroups. In discovery, internal and external validation cohort, C-indexes of our optimal OCC model were 0.871 (95% CI, 0.863-0.878), 0.692 (95% CI, 0.667-0.717) and 0.648 (95% CI, 0.630-0.667), respectively; the 2-year AUCs of OCC model were 0.939 (95% CI, 0.920-0.959), 0.738 (95% CI, 0.667-0.809) and 0.725 (95% CI, 0.643-0.808), respectively. For subgroup analysis of HCC patients with HBV, aged less than 65, cirrhosis or resection as first therapy, C-indexes of our optimal OCC model were 0.772 (95% CI, 0.752-0.792), 0.769 (95% CI, 0.750-0.789), 0.855 (95% CI, 0.846-0.864) and 0.760 (95% CI, 0.741-0.778), respectively. In general, the optimal OCC model based on RSF algorithm shows prognostic guidance value in HCC patients undergoing individualized treatment.

2.
Cancers (Basel) ; 16(16)2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39199651

RESUMEN

Since the mid-1980s, there has been little progress in improving survival of patients diagnosed with osteosarcoma. Survival prediction models play a key role in clinical decision-making, guiding healthcare professionals in tailoring treatment strategies based on individual patient risks. The increasing interest of the medical community in using machine learning (ML) for predicting survival has sparked an ongoing debate on the value of ML techniques versus more traditional statistical modelling (SM) approaches. This study investigates the use of SM versus ML methods in predicting overall survival (OS) using osteosarcoma data from the EURAMOS-1 clinical trial (NCT00134030). The well-established Cox proportional hazard model is compared to the extended Cox model that includes time-varying effects, and to the ML methods random survival forests and survival neural networks. The impact of eight variables on OS predictions is explored. Results are compared on different model performance metrics, variable importance, and patient-specific predictions. The article provides comprehensive insights to aid healthcare researchers in evaluating diverse survival prediction models for low-dimensional clinical data.

3.
Sci Rep ; 14(1): 15566, 2024 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-38971926

RESUMEN

Understanding the combined effects of risk factors on all-cause mortality is crucial for implementing effective risk stratification and designing targeted interventions, but such combined effects are understudied. We aim to use survival-tree based machine learning models as more flexible nonparametric techniques to examine the combined effects of multiple physiological risk factors on mortality. More specifically, we (1) study the combined effects between multiple physiological factors and all-cause mortality, (2) identify the five most influential factors and visualize their combined influence on all-cause mortality, and (3) compare the mortality cut-offs with the current clinical thresholds. Data from the 1999-2014 NHANES Survey were linked to National Death Index data with follow-up through 2015 for 17,790 adults. We observed that the five most influential factors affecting mortality are the tobacco smoking biomarker cotinine, glomerular filtration rate (GFR), plasma glucose, sex, and white blood cell count. Specifically, high mortality risk is associated with being male, active smoking, low GFR, elevated plasma glucose levels, and high white blood cell count. The identified mortality-based cutoffs for these factors are mostly consistent with relevant studies and current clinical thresholds. This approach enabled us to identify important cutoffs and provide enhanced risk prediction as an important basis to inform clinical practice and develop new strategies for precision medicine.


Asunto(s)
Tasa de Filtración Glomerular , Aprendizaje Automático , Humanos , Masculino , Femenino , Factores de Riesgo , Persona de Mediana Edad , Adulto , Anciano , Glucemia/análisis , Glucemia/metabolismo , Cotinina/sangre , Recuento de Leucocitos , Mortalidad , Medición de Riesgo/métodos , Biomarcadores/sangre , Encuestas Nutricionales , Causas de Muerte
4.
BMC Geriatr ; 24(1): 432, 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38755603

RESUMEN

BACKGROUND: It has been proposed that inflammation plays a role in the development of sarcopenia. This study aimed to investigate the links of complete blood cell count (CBC) parameters and CBC-derived inflammatory indicators with sarcopenia and mortality. METHODS: Data pertaining to sarcopenia were extracted from the 1999-2006 National Health and Nutrition Examination Survey (NHANES), and mortality events were ascertained through the National Death Index up to December 31, 2019. The CBC-derived inflammatory indicators assessed in this study included the neutrophil-to-lymphocyte ratio (NLR), derived neutrophil-to-lymphocyte ratio (dNLR), monocyte-to-lymphocyte ratio (MLR), neutrophil-monocyte to lymphocyte ratio (NMLR), systemic inflammatory response index (SIRI), and systemic immune-inflammation index (SII). The prognostic significance of these CBC-derived inflammatory indicators was evaluated using the random survival forests (RSF) analysis. RESULTS: The study encompassed a cohort of 12,689 individuals, among whom 1,725 were diagnosed with sarcopenia. Among individuals with sarcopenia, 782 experienced all-cause mortality, and 195 succumbed to cardiovascular causes. Following adjustment for confounding variables, it was observed that elevated levels of NLR, dNLR, NMLR, SIRI, and SII were associated with an increased prevalence of sarcopenia. Among participants with sarcopenia, those in the highest quartile of NLR (HR = 1.336 [1.095-1.631]), dNLR (HR = 1.274 [1.046-1.550]), MLR (HR = 1.619 [1.290-2.032]), NMLR (HR = 1.390 [1.132-1.707]), and SIRI (HR = 1.501 [1.210-1.862]) exhibited an elevated risk of all-cause mortality compared to those in the lowest quartile of these inflammation-derived indicators. These associations were similarly observed in cardiovascular mortality (HR = 1.874 [1.169-3.003] for MLR, HR = 1.838 [1.175-2.878] for SIRI). The RSF analysis indicated that MLR exhibited the highest predictive power for both all-cause and cardiovascular mortality among individuals with sarcopenia. CONCLUSIONS: Our findings underscore the association between CBC-derived inflammatory indicators and mortality in adults with sarcopenia. Of note, MLR emerged as the most robust predictor of all-cause and cardiovascular mortality in this population.


Asunto(s)
Inflamación , Encuestas Nutricionales , Sarcopenia , Humanos , Sarcopenia/mortalidad , Sarcopenia/epidemiología , Sarcopenia/diagnóstico , Sarcopenia/sangre , Masculino , Femenino , Encuestas Nutricionales/métodos , Encuestas Nutricionales/tendencias , Anciano , Inflamación/sangre , Persona de Mediana Edad , Recuento de Células Sanguíneas/tendencias , Recuento de Células Sanguíneas/métodos , Anciano de 80 o más Años , Neutrófilos , Pronóstico , Adulto , Estados Unidos/epidemiología
5.
Brain Sci ; 14(3)2024 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-38539590

RESUMEN

Alzheimer's disease (AD) exhibits sex-linked variations, with women having a higher prevalence, and little is known about the sexual dimorphism in progressing from Mild Cognitive Impairment (MCI) to AD. The main aim of our study was to shed light on the sex-specific conversion-to-AD risk factors using Random Survival Forests (RSF), a Machine Learning survival approach, and Shapley Additive Explanations (SHAP) on dementia biomarkers in stable (sMCI) and progressive (pMCI) patients. With this purpose, we built two separate models for male (M-RSF) and female (F-RSF) cohorts to assess whether global explanations differ between the sexes. Similarly, SHAP local explanations were obtained to investigate changes across sexes in feature contributions to individual risk predictions. The M-RSF achieved higher performance on the test set (0.87) than the F-RSF (0.79), and global explanations of male and female models had limited similarity (<71.1%). Common influential variables across the sexes included brain glucose metabolism and CSF biomarkers. Conversely, the M-RSF had a notable contribution from hippocampus, which had a lower impact on the F-RSF, while verbal memory and executive function were key contributors only in F-RSF. Our findings confirmed that females had a higher risk of progressing to dementia; moreover, we highlighted distinct sex-driven patterns of variable importance, uncovering different feature contribution risks across sexes that decrease/increase the conversion-to-AD risk.

6.
Front Med (Lausanne) ; 11: 1368899, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38545509

RESUMEN

Background and objectives: The prognosis of liver failure treated with non-bioartificial liver support systems is poor. Detecting its risk factors and developing relevant prognostic models still represent the top priority to lower its death risk. Methods: All 215 patients with liver failure treated with non-bioartificial liver support system were retrospectively analyzed. Potential prognostic factors were investigated, and the Nomogram and the Random Survival Forests (RSF) models were constructed, respectively. Notably, we evaluated the performance of models and calculated the risk scores to divide patients into low-risk and high-risk groups. Results: In the training set, multifactorial Cox regression analysis showed that etiology, hepatic encephalopathy, total bilirubin, serum alkaline phosphatase, platelets, and MELD score were independent factors of short-term prognosis. The RSF model (AUC: 0.863, 0.792) performed better in prediction than the Nomogram model (AUC: 0.816, 0.756) and MELD (AUC: 0.658, 0.700) in the training and validation groups. On top of that, patients in the low-risk group had a significantly better prognosis than those in the high-risk group. Conclusion: We constructed the RSF model with etiology, hepatic encephalopathy, total bilirubin, serum alkaline phosphatase, platelets, and MELD score, which showed better prognostic power than the Nomogram model and MELD score and could help physicians make optimal treatment decisions.

7.
Digit Health ; 10: 20552076231224225, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38235416

RESUMEN

Objective: Chronic kidney disease (CKD) poses a major global health burden. Early CKD risk prediction enables timely interventions, but conventional models have limited accuracy. Machine learning (ML) enhances prediction, but interpretability is needed to support clinical usage with both in diagnostic and decision-making. Methods: A cohort of 491 patients with clinical data was collected for this study. The dataset was randomly split into an 80% training set and a 20% testing set. To achieve the first objective, we developed four ML algorithms (logistic regression, random forests, neural networks, and eXtreme Gradient Boosting (XGBoost)) to classify patients into two classes-those who progressed to CKD stages 3-5 during follow-up (positive class) and those who did not (negative class). For the classification task, the area under the receiver operating characteristic curve (AUC-ROC) was used to evaluate model performance in discriminating between the two classes. For survival analysis, Cox proportional hazards regression (COX) and random survival forests (RSFs) were employed to predict CKD progression, and the concordance index (C-index) and integrated Brier score were used for model evaluation. Furthermore, variable importance, partial dependence plots, and restrict cubic splines were used to interpret the models' results. Results: XGBOOST demonstrated the best predictive performance for CKD progression in the classification task, with an AUC-ROC of 0.867 (95% confidence interval (CI): 0.728-0.100), outperforming the other ML algorithms. In survival analysis, RSF showed slightly better discrimination and calibration on the test set compared to COX, indicating better generalization to new data. Variable importance analysis identified estimated glomerular filtration rate, age, and creatinine as the most important predictors for CKD survival analysis. Further analysis revealed non-linear associations between age and CKD progression, suggesting higher risks in patients aged 52-55 and 65-66 years. The association between cholesterol levels and CKD progression was also non-linear, with lower risks observed when cholesterol levels were in the range of 5.8-6.4 mmol/L. Conclusions: Our study demonstrated the effectiveness of interpretable ML models for predicting CKD progression. The comparison between COX and RSF highlighted the advantages of ML in survival analysis, particularly in handling non-linearity and high-dimensional data. By leveraging interpretable ML for unraveling risk factor relationships, contrasting predictive techniques, and exposing non-linear associations, this study significantly advances CKD risk prediction to enable enhanced clinical decision-making.

8.
Brain Inform ; 10(1): 31, 2023 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-37979033

RESUMEN

Random Survival Forests (RSF) has recently showed better performance than statistical survival methods as Cox proportional hazard (CPH) in predicting conversion risk from mild cognitive impairment (MCI) to Alzheimer's disease (AD). However, RSF application in real-world clinical setting is still limited due to its black-box nature.For this reason, we aimed at providing a comprehensive study of RSF explainability with SHapley Additive exPlanations (SHAP) on biomarkers of stable and progressive patients (sMCI and pMCI) from Alzheimer's Disease Neuroimaging Initiative. We evaluated three global explanations-RSF feature importance, permutation importance and SHAP importance-and we quantitatively compared them with Rank-Biased Overlap (RBO). Moreover, we assessed whether multicollinearity among variables may perturb SHAP outcome. Lastly, we stratified pMCI test patients in high, medium and low risk grade, to investigate individual SHAP explanation of one pMCI patient per risk group.We confirmed that RSF had higher accuracy (0.890) than CPH (0.819), and its stability and robustness was demonstrated by high overlap (RBO > 90%) between feature rankings within first eight features. SHAP local explanations with and without correlated variables had no substantial difference, showing that multicollinearity did not alter the model. FDG, ABETA42 and HCI were the first important features in global explanations, with the highest contribution also in local explanation. FAQ, mPACCdigit, mPACCtrailsB and RAVLT immediate had the highest influence among all clinical and neuropsychological assessments in increasing progression risk, as particularly evident in pMCI patients' individual explanation. In conclusion, our findings suggest that RSF represents a useful tool to support clinicians in estimating conversion-to-AD risk and that SHAP explainer boosts its clinical utility with intelligible and interpretable individual outcomes that highlights key features associated with AD prognosis.

9.
Int J Chron Obstruct Pulmon Dis ; 18: 1457-1473, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37485052

RESUMEN

Introduction: In this article, we explore to what extent it is possible to leverage on very small data to build machine learning (ML) models that predict acute exacerbations of chronic obstructive pulmonary disease (AECOPD). Methods: We build ML models using the small data collected during the eHealth Diary telemonitoring study between 2013 and 2017 in Sweden. This data refers to a group of multimorbid patients, namely 18 patients with chronic obstructive pulmonary disease (COPD) as the major reason behind previous hospitalisations. The telemonitoring was supervised by a specialised hospital-based home care (HBHC) unit, which also was responsible for the medical actions needed. Results: We implement two different ML approaches, one based on time-dependent covariates and the other one based on time-independent covariates. We compare the first approach with standard COX Proportional Hazards (CPH). For the second one, we use different proportions of synthetic data to build models and then evaluate the best model against authentic data. Discussion: To the best of our knowledge, the present ML study shows for the first time that the most important variable for an increased risk of future AECOPDs is "maintenance medication changes by HBHC". This finding is clinically relevant since a sub-optimal maintenance treatment, requiring medication changes, puts the patient in risk for future AECOPDs. Conclusion: The experiments return useful insights about the use of small data for ML.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Humanos , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/tratamiento farmacológico , Suecia , Progresión de la Enfermedad
10.
Biom J ; 65(5): e2200153, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37068191

RESUMEN

Buckley-James (BJ) model is a typical semiparametric accelerated failure time model, which is closely related to the ordinary least squares method and easy to be constructed. However, traditional BJ model built on linearity assumption only captures simple linear relationships, while it has difficulty in processing nonlinear problems. To overcome this difficulty, in this paper, we develop a novel regression model for right-censored survival data within the learning framework of BJ model, basing on random survival forests (RSF), extreme learning machine (ELM), and L2 boosting algorithm. The proposed method, referred to as ELM-based BJ boosting model, employs RSF for covariates imputation first, then develops a new ensemble of ELMs-ELM-based boosting algorithm for regression by ensemble scheme of L2 boosting, and finally, uses the output function of the proposed ELM-based boosting model to replace the linear combination of covariates in BJ model. Due to fitting the logarithm of survival time with covariates by the nonparametric ELM-based boosting method instead of the least square method, the ELM-based BJ boosting model can capture both linear covariate effects and nonlinear covariate effects. In both simulation studies and real data applications, in terms of concordance index and integrated Brier sore, the proposed ELM-based BJ boosting model can outperform traditional BJ model, two kinds of BJ boosting models proposed by Wang et al., RSF, and Cox proportional hazards model.


Asunto(s)
Algoritmos , Bosques Aleatorios , Modelos de Riesgos Proporcionales , Simulación por Computador , Análisis de los Mínimos Cuadrados
11.
BMC Med Res Methodol ; 23(1): 51, 2023 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-36829145

RESUMEN

BACKGROUND: In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting). METHODS: A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years. RESULTS: Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated. CONCLUSIONS: Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.


Asunto(s)
Aprendizaje Automático , Modelos Estadísticos , Humanos , Pronóstico , Estudios Retrospectivos , Redes Neurales de la Computación
12.
J Biomed Inform ; 137: 104266, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36494059

RESUMEN

Liver cancer is a common malignant tumor, and its clinical stage is closely related to the clinical treatment and prognosis of patients. Currently, the BCLC staging system revised by the BCLC group of University of Barcelona is the globally recognized staging system for liver cancer. However, with the deepening of related research, the current staging system can no longer fully meet the clinical needs. In this work, we propose a novel machine learning method for constructing an automatic hepatocellular carcinoma staging model that incorporates far more clinical variables than any existing staging system. Our model is based on random survival forests, which generates a unique hazard function for each patient. B-splines are used to embed hazard functions into vectors in low-dimensional space and hierarchical clustering method groups similar patients to form staging cohorts. The resulting staging system significantly outperforms the BCLC system in terms of distinctiveness between patients in different stages.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Estadificación de Neoplasias , Estudios Retrospectivos , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/patología , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/patología , Pronóstico
13.
Circ Rep ; 4(12): 595-603, 2022 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-36530840

RESUMEN

Background: Cardiovascular disease (CVD) screening entails precise event prediction to orient risk stratification, resource allocation, and insurance policy. We used random survival forests (RSF) to identify markers of incident CVD among Japanese adults enrolled in an employer-mandated screening program. Methods and Results: We examined biomarker, health history, medication use, and lifestyle data from 155,108 adults aged ≥40 years. The occurrence of coronary artery disease (CAD) or atherosclerotic CVD (ASCVD) events was examined over 6 years of follow-up. The analysis used RSF to identify predictors, then investigated simplified RSF models with fewer predictors for individual-level risk prediction. Data were split into training (70%) and test (30%) datasets. At baseline, the median patient age was 47 years (interquartile range 41-56 years), with 65% males. In all, 1,642 CAD and 2,164 ASCVD events were observed. RSF identified history of heart disease, age, self-reported blood pressure medication, HbA1c, fasting blood sugar, and high-density lipoprotein as important markers of both endpoints. RSF analyses with only the top 20 predictors demonstrated good performance, with areas under the curve of >84% for CAD and >82% for ASCVD in test data across 6 years. Conclusions: We present a machine learning technique for accurate assessment of cardiovascular risk using employer-mandated annual health checkup information. The algorithm produces individual-level risk curves over time, empowering clinicians to efficiently implement prevention strategies in a low-risk population.

14.
J Clin Med ; 11(20)2022 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-36294336

RESUMEN

(1) Background: Biomarker and model development can help physicians adjust the management of patients with community-acquired pneumonia (CAP) by screening for inpatients with a low probability of cure early in their admission; (2) Methods: We conducted a 30-day cohort study of newly admitted adult CAP patients over 20 years of age. Prognosis models to predict the short-term prognosis were developed using random survival forest (RSF) method; (3) Results: A total of 247 adult CAP patients were studied and 208 (84.21%) of them reached clinical stability within 30 days. The soluble form of suppression of tumorigenicity-2 (sST2) was an independent predictor of clinical stability and the addition of sST2 to the prognosis model could improve the performance of the prognosis model. The C-index of the RSF model for predicting clinical stability was 0.8342 (95% CI, 0.8086-0.8598), which is higher than 0.7181 (95% CI, 0.6933-0.7429) of CURB 65 score, 0.8025 (95% CI, 0.7776-8274) of PSI score, and 0.8214 (95% CI, 0.8080-0.8348) of cox regression. In addition, the RSF model was associated with adverse clinical events during hospitalization, ICU admissions, and short-term mortality; (4) Conclusions: The RSF model by incorporating sST2 was more accurate than traditional methods in assessing the short-term prognosis of CAP patients.

15.
Genomics Inform ; 20(2): e23, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35794703

RESUMEN

A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

16.
Updates Surg ; 74(1): 355-365, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34003477

RESUMEN

Many researches have applied machine learning methods to find associations between radiomic features and clinical outcomes. Random survival forests (RSF), as an accurate classifier, sort all candidate variables as the rank of importance values. There was no study concerning on finding radiomic predictors in patients with extremity and trunk wall soft-tissue sarcomas using RSF. This study aimed to determine associations between radiomic features and overall survival (OS) by RSF analysis. To identify radiomic features with important values by RSF analysis, construct predictive models for OS incorporating clinical characteristics, and evaluate models' performance with different method. We collected clinical characteristics and radiomic features extracted from plain and contrast-enhanced computed tomography (CT) from 353 patients with extremity and trunk wall soft-tissue sarcomas treated with surgical resection. All radiomic features were analyzed by Cox proportional hazard (CPH) and followed RSF analysis. The association between radiomics-predicted risks and OS was assessed by Kaplan-Meier analysis. All clinical features were screened by CPH analysis. Prognostic clinical and radiomic parameters were fitted into RSF and CPH integrative models for OS in the training cohort, respectively. The concordance indexes (C-index) and Brier scores of both two models were evaluated in both training and testing cohorts. The model with better predictive performance was interpreted with nomogram and calibration plots. Among all 86 radiomic features, there were three variables selected with high importance values. The RSF on these three features distinguished patients with high predicted risks from patients with low predicted risks for OS in the training set (P < 0.001) using Kaplan-Meier analysis. Age, lymph node involvement and grade were incorporated into the combined models for OS (P < 0.05). The C-indexes in both two integrative models fluctuated above 0.80 whose Brier scores maintained less than 15.0 in the training and testing datasets. The RSF model performed little advantages over the CPH model that the calibration curve of the RSF model showed favorable agreement between predicted and actual survival probabilities for the 3-year and 5-year survival prediction. The multimodality RSF model including clinical and radiomic characteristics conducted high capacity in prediction of OS which might assist individualized therapeutic regimens. Level III, prognostic study.


Asunto(s)
Sarcoma , Extremidades/diagnóstico por imagen , Extremidades/cirugía , Humanos , Aprendizaje Automático , Pronóstico , Sarcoma/diagnóstico por imagen , Sarcoma/cirugía , Tomografía Computarizada por Rayos X
17.
Demography ; 59(1): 161-186, 2022 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-34918743

RESUMEN

This study contributes to the literature on union dissolution by adopting a machine learning (ML) approach, specifically Random Survival Forests (RSF). We used RSF to analyze data on 2,038 married or cohabiting couples who participated in the German Socio-Economic Panel Survey, and found that RSF had considerably better predictive accuracy than conventional regression models. The man's and the woman's life satisfaction and the woman's percentage of housework were the most important predictors of union dissolution; several other variables (e.g., woman's working hours, being married) also showed substantial predictive power. RSF was able to detect complex patterns of association, and some predictors examined in previous studies showed marginal or null predictive power. Finally, while we found that some personality traits were strongly predictive of union dissolution, no interactions between those traits were evident, possibly reflecting assortative mating by personality traits. From a methodological point of view, the study demonstrates the potential benefits of ML techniques for the analysis of union dissolution and for demographic research in general. Key features of ML include the ability to handle a large number of predictors, the automatic detection of nonlinearities and nonadditivities between predictors and the outcome, generally superior predictive accuracy, and robustness against multicollinearity.


Asunto(s)
Aprendizaje Automático , Matrimonio , Humanos , Composición Familiar , Alemania
18.
Comput Biol Med ; 141: 105001, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34782112

RESUMEN

Many clinical studies follow patients over time and record the time until the occurrence of an event of interest (e.g., recovery, death, …). When patients drop out of the study or when their event did not happen before the study ended, the collected dataset is said to contain censored observations. Given the rise of personalized medicine, clinicians are often interested in accurate risk prediction models that predict, for unseen patients, a survival profile, including the expected time until the event. Survival analysis methods are used to detect associations or compare subpopulations of patients in this context. In this article, we propose to cast the time-to-event prediction task as a multi-target regression task, with censored observations modeled as partially labeled examples. We then apply semi-supervised learning to the resulting data representation. More specifically, we use semi-supervised predictive clustering trees and ensembles thereof. Empirical results over eleven real-life datasets demonstrate superior or equivalent predictive performance of the proposed approach as compared to three competitor methods. Moreover, smaller models are obtained compared to random survival forests, another tree ensemble method. Finally, we illustrate the informative feature selection mechanism of our method, by interpreting the splits induced by a single tree model when predicting survival for amyotrophic lateral sclerosis patients.


Asunto(s)
Aprendizaje Automático Supervisado , Análisis por Conglomerados , Humanos , Análisis Multivariante , Análisis de Supervivencia
19.
Cancers (Basel) ; 13(10)2021 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-34069979

RESUMEN

Current lifestyle recommendations for cancer survivors are the same as those for the general public to decrease their risk of cancer. However, it is unclear which lifestyle behaviors are most important for prognosis. We aimed to identify which lifestyle behaviors were most important regarding colorectal cancer (CRC) recurrence and all-cause mortality with a data-driven method. The study consisted of 1180 newly diagnosed stage I-III CRC patients from a prospective cohort study. Lifestyle behaviors included in the current recommendations, as well as additional lifestyle behaviors related to diet, physical activity, adiposity, alcohol use, and smoking were assessed six months after diagnosis. These behaviors were simultaneously analyzed as potential predictors of recurrence or all-cause mortality with Random Survival Forests (RSFs). We observed 148 recurrences during 2.6-year median follow-up and 152 deaths during 4.8-year median follow-up. Higher intakes of sugary drinks were associated with increased recurrence risk. For all-cause mortality, fruit and vegetable, liquid fat and oil, and animal protein intake were identified as the most important lifestyle behaviors. These behaviors showed non-linear associations with all-cause mortality. Our exploratory RSF findings give new ideas on potential associations between certain lifestyle behaviors and CRC prognosis that still need to be confirmed in other cohorts of CRC survivors.

20.
Int J Med Inform ; 145: 104305, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33188949

RESUMEN

PURPOSE: To develop and internally validate an illness burden index among Medicare beneficiaries before or after a cancer diagnosis. METHODS: Data source: SEER-CAHPS, linking Surveillance, Epidemiology, and End Results (SEER) cancer registry, Medicare enrollment and claims, and Medicare Consumer Assessment of Healthcare Providers and Systems (Medicare CAHPS) survey data providing self-reported sociodemographic, health, and functional status information. To generate a score for everyone in the dataset, we tabulated 4 groups within each annual subsample (2007-2013): 1) Medicare Advantage (MA) beneficiaries or 2) Medicare fee-for-service (FFS) beneficiaries, surveyed before cancer diagnosis; 3) MA beneficiaries or 4) Medicare FFS beneficiaries surveyed after diagnosis. Random survival forests (RSFs) predicted 12-month all-cause mortality and drew predictor variables (mean per subsample = 44) from 8 domains: sociodemographic, cancer-specific, health status, chronic conditions, healthcare utilization, activity limitations, proxy, and location-based factors. Roughly two-thirds of the sample was held out for algorithm training. Error rates based on the validation ("out-of-bag," OOB) samples reflected the correctly classified percentage. Illness burden scores represented predicted cumulative mortality hazard. RESULTS: The sample included 116,735 Medicare beneficiaries with cancer, of whom 73 % were surveyed after their cancer diagnosis; overall mean mortality rate in the 12 months after survey response was 6%. SEER-CAHPS Illness Burden Index (SCIBI) scores were positively skewed (median range: 0.29 [MA, pre-diagnosis] to 2.85 [FFS, post-diagnosis]; mean range: 2.08 [MA, pre-diagnosis] to 4.88 [MA, post-diagnosis]). The highest decile of the distribution had a 51 % mortality rate (range: 29-71 %); the bottom decile had a 1% mortality rate (range: 0-2 %). The error rate was 20 % overall (range: 9% [among FFS enrollees surveyed after diagnosis] to 36 % [MA enrollees surveyed before diagnosis]). CONCLUSIONS: This new morbidity measure for Medicare beneficiaries with cancer may be useful to future SEER-CAHPS users who wish to adjust for comorbidity.


Asunto(s)
Medicare Part C , Neoplasias , Anciano , Costo de Enfermedad , Planes de Aranceles por Servicios , Humanos , Neoplasias/diagnóstico , Web Semántica , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA