RESUMEN
PURPOSE: Machine learning (ML) algorithms that incorporate routinely collected patient-reported outcomes (PROs) alongside electronic health record (EHR) variables may improve prediction of short-term mortality and facilitate earlier supportive and palliative care for patients with cancer. METHODS: We trained and validated two-phase ML algorithms that incorporated standard PRO assessments alongside approximately 200 routinely collected EHR variables, among patients with medical oncology encounters at a tertiary academic oncology and a community oncology practice. RESULTS: Among 12,350 patients, 5,870 (47.5%) completed PRO assessments. Compared with EHR- and PRO-only algorithms, the EHR + PRO model improved predictive performance in both tertiary oncology (EHR + PRO v EHR v PRO: area under the curve [AUC] 0.86 [0.85-0.87] v 0.82 [0.81-0.83] v 0.74 [0.74-0.74]) and community oncology (area under the curve 0.89 [0.88-0.90] v 0.86 [0.85-0.88] v 0.77 [0.76-0.79]) practices. CONCLUSION: Routinely collected PROs contain added prognostic information not captured by an EHR-based ML mortality risk algorithm. Augmenting an EHR-based algorithm with PROs resulted in a more accurate and clinically relevant model, which can facilitate earlier and targeted supportive care for patients with cancer.
Asunto(s)
Registros Electrónicos de Salud , Neoplasias , Humanos , Medición de Resultados Informados por el Paciente , Cuidados Paliativos , Aprendizaje Automático , Neoplasias/diagnóstico , Neoplasias/terapiaRESUMEN
Research Objective: Health systems use clinical predictive algorithms to allocate resources to high-risk patients. Such algorithms are trained using historical data and are later implemented in clinical settings. During this implementation period, predictive algorithms are prone to performance changes ("drift") due to exogenous shocks in utilization or shifts in patient characteristics. Our objective was to examine the impact of sudden utilization shifts during the SARS-CoV-2 pandemic on the performance of an electronic health record (EHR)-based prognostic algorithm. Study Design: We studied changes in the performance of Conversation Connect, a validated machine learning algorithm that predicts 180-day mortality among outpatients with cancer receiving care at medical oncology practices within a large academic cancer center. Conversation Connect generates mortality risk predictions before each encounter using data from 159 EHR variables collected in the six months before the encounter. Since January 2019, Conversation Connect has been used as part of a behavioral intervention to prompt clinicians to consider early advance care planning conversations among patients with ≥10% mortality risk. First, we descriptively compared encounter-level characteristics in the following periods: January 2019-February 2020 ("pre-pandemic"), March-May 2020 ("early-pandemic"), and June-December 2020 ("later-pandemic"). Second, we quantified changes in high-risk patient encounters using interrupted time series analyses that controlled for pre-pandemic trends and demographic, clinical, and practice covariates. Our primary metric of performance drift was false negative rate (FNR). Third, we assessed contributors to performance drift by comparing distributions of key EHR inputs across periods and predicting later pandemic utilization using pre-pandemic inputs. Population Studied: 237,336 in-person and telemedicine medical oncology encounters. Principal Findings: Age, race, average patient encounters per month, insurance type, comorbidity counts, laboratory values, and overall mortality were similar among encounters in the pre-, early-, and later-pandemic periods. Relative to the pre-pandemic period, the later-pandemic period was characterized by a 6.5-percentage-point decrease (28.2% vs. 34.7%) in high-risk encounters (p<0.001). FNR increased from 41.0% (95% CI 38.0-44.1%) in the pre-pandemic period to 57.5% (95% CI 51.9-63.0%) in the later pandemic period. Compared to the pre-pandemic period, the early and later pandemic periods had higher proportions of telemedicine encounters (0.01% pre-pandemic vs. 20.0% early-pandemic vs. 26.4% later-pandemic) and encounters with no preceding laboratory draws (17.7% pre-pandemic vs. 19.8% early-pandemic vs. 24.1% later-pandemic). In the later pandemic period, observed laboratory utilization was lower than predicted (76.0% vs 81.2%, p<0.001). In the later-pandemic period, mean 180-day mortality risk scores were lower for telemedicine encounters vs. in-person encounters (10.3% vs 11.2%, p<0.001) and encounters with no vs. any preceding laboratory draws (1.5% vs. 14.0%, p<0.001). Conclusions: During the SARS-CoV-2 pandemic period, the performance of a machine learning prognostic algorithm used to prompt advance care planning declined substantially. Increases in telemedicine and declines in laboratory utilization contributed to lower performance. Implications for Policy or Practice: This is the first study to show algorithm performance drift due to SARS-CoV-2 pandemic-related shifts in telemedicine and laboratory utilization. These mechanisms of performance drift could apply to other EHR clinical predictive algorithms. Pandemic-related decreases in care utilization may negatively impact the performance of clinical predictive algorithms and warrant assessment and possible retraining of such algorithms.
RESUMEN
PURPOSE: Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS: We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS: The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION: Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.
Asunto(s)
Aprendizaje Automático , Neoplasias , Área Bajo la Curva , Registros Electrónicos de Salud , Humanos , Neoplasias/diagnóstico , Neoplasias/epidemiología , Factores de RiesgoRESUMEN
OBJECTIVES: Develop and implement a machine learning algorithm to predict severe sepsis and septic shock and evaluate the impact on clinical practice and patient outcomes. DESIGN: Retrospective cohort for algorithm derivation and validation, pre-post impact evaluation. SETTING: Tertiary teaching hospital system in Philadelphia, PA. PATIENTS: All non-ICU admissions; algorithm derivation July 2011 to June 2014 (n = 162,212); algorithm validation October to December 2015 (n = 10,448); silent versus alert comparison January 2016 to February 2017 (silent n = 22,280; alert n = 32,184). INTERVENTIONS: A random-forest classifier, derived and validated using electronic health record data, was deployed both silently and later with an alert to notify clinical teams of sepsis prediction. MEASUREMENT AND MAIN RESULT: Patients identified for training the algorithm were required to have International Classification of Diseases, 9th Edition codes for severe sepsis or septic shock and a positive blood culture during their hospital encounter with either a lactate greater than 2.2 mmol/L or a systolic blood pressure less than 90 mm Hg. The algorithm demonstrated a sensitivity of 26% and specificity of 98%, with a positive predictive value of 29% and positive likelihood ratio of 13. The alert resulted in a small statistically significant increase in lactate testing and IV fluid administration. There was no significant difference in mortality, discharge disposition, or transfer to ICU, although there was a reduction in time-to-ICU transfer. CONCLUSIONS: Our machine learning algorithm can predict, with low sensitivity but high specificity, the impending occurrence of severe sepsis and septic shock. Algorithm-generated predictive alerts modestly impacted clinical measures. Next steps include describing clinical perception of this tool and optimizing algorithm design and delivery.
Asunto(s)
Algoritmos , Sistemas de Apoyo a Decisiones Clínicas , Diagnóstico por Computador , Aprendizaje Automático , Sepsis/diagnóstico , Choque Séptico/diagnóstico , Estudios de Cohortes , Registros Electrónicos de Salud , Hospitales de Enseñanza , Humanos , Estudios Retrospectivos , Sensibilidad y Especificidad , Envío de Mensajes de TextoRESUMEN
OBJECTIVE: To assess clinician perceptions of a machine learning-based early warning system to predict severe sepsis and septic shock (Early Warning System 2.0). DESIGN: Prospective observational study. SETTING: Tertiary teaching hospital in Philadelphia, PA. PATIENTS: Non-ICU admissions November-December 2016. INTERVENTIONS: During a 6-week study period conducted 5 months after Early Warning System 2.0 alert implementation, nurses and providers were surveyed twice about their perceptions of the alert's helpfulness and impact on care, first within 6 hours of the alert, and again 48 hours after the alert. MEASUREMENTS AND MAIN RESULTS: For the 362 alerts triggered, 180 nurses (50% response rate) and 107 providers (30% response rate) completed the first survey. Of these, 43 nurses (24% response rate) and 44 providers (41% response rate) completed the second survey. Few (24% nurses, 13% providers) identified new clinical findings after responding to the alert. Perceptions of the presence of sepsis at the time of alert were discrepant between nurses (13%) and providers (40%). The majority of clinicians reported no change in perception of the patient's risk for sepsis (55% nurses, 62% providers). A third of nurses (30%) but few providers (9%) reported the alert changed management. Almost half of nurses (42%) but less than a fifth of providers (16%) found the alert helpful at 6 hours. CONCLUSIONS: In general, clinical perceptions of Early Warning System 2.0 were poor. Nurses and providers differed in their perceptions of sepsis and alert benefits. These findings highlight the challenges of achieving acceptance of predictive and machine learning-based sepsis alerts.
Asunto(s)
Algoritmos , Actitud del Personal de Salud , Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Automático , Sepsis/diagnóstico , Choque Séptico/diagnóstico , Diagnóstico por Computador , Registros Electrónicos de Salud , Hospitales de Enseñanza , Humanos , Cuerpo Médico de Hospitales , Personal de Enfermería en Hospital , Pautas de la Práctica en Enfermería/estadística & datos numéricos , Pautas de la Práctica en Medicina/estadística & datos numéricos , Estudios Prospectivos , Envío de Mensajes de TextoRESUMEN
Sedation minimization and ventilator liberation protocols improve outcomes but are challenging to implement. We sought to demonstrate proof-of-concept and impact of an electronic application promoting sedation minimization and ventilator liberation. DESIGN: Multi-ICU proof-of-concept study and a single ICU before-after study. SETTING: University hospital ICUs. PATIENTS: Adult patients receiving mechanical ventilation. INTERVENTIONS: An automated application consisting of 1) a web-based dashboard with real-time data on spontaneous breathing trial readiness, sedation depth, sedative infusions, and nudges to wean sedation and ventilatory support and 2) text-message alerts once patients met criteria for a spontaneous breathing trial and spontaneous awakening trial. Pre-intervention, sedation minimization, and ventilator liberation were reviewed daily during a multidisciplinary huddle. Post-intervention, the dashboard was used during the multidisciplinary huddle, throughout the day by respiratory therapists, and text alerts were sent to bedside providers. MEASUREMENTS AND MAIN RESULTS: We enrolled 115 subjects in the proof-of-concept study. Spontaneous breathing trial alerts were accurate (98.3%), usually sent while patients were receiving mandatory ventilation (88.5%), and 61.9% of patients received concurrent spontaneous awakening trial alerts. We enrolled 457 subjects in the before-after study, 221 pre-intervention and 236 post-intervention. After implementation, patients were 28% more likely to be extubated (hazard ratio, 1.28; 95% CI, 1.01-1.63; p = 0.042) and 31% more likely to be discharged from the ICU (hazard ratio, 1.31; 95% CI, 1.03-1.67; p = 0.027) at any time point. After implementation, the median duration of mechanical ventilation was 2.20 days (95% CI, 0.09-4.31 d; p = 0.042) shorter and the median ICU length of stay was 2.65 days (95% CI, 0.13-5.16 d; p = 0.040) shorter, compared with the expected durations without the application. CONCLUSIONS: Implementation of an electronic dashboard and alert system promoting sedation minimization and ventilator liberation was associated with reductions in the duration of mechanical ventilation and ICU length of stay.