Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
J Med Internet Res ; 26: e46904, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38820579

RESUMEN

BACKGROUND: Health care organizations worldwide are faced with an increasing number of cyberattacks and threats to their critical infrastructure. These cyberattacks cause significant data breaches in digital health information systems, which threaten patient safety and privacy. OBJECTIVE: From a sociotechnical perspective, this paper explores why digital health care systems are vulnerable to cyberattacks and provides sociotechnical solutions through a systematic literature review (SLR). METHODS: An SLR using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) was conducted by searching 6 databases (PubMed, Web of Science, ScienceDirect, Scopus, Institute of Electrical and Electronics Engineers, and Springer) and a journal (Management Information Systems Quarterly) for articles published between 2012 and 2022 and indexed using the following keywords: "(cybersecurity OR cybercrime OR ransomware) AND (healthcare) OR (cybersecurity in healthcare)." Reports, review articles, and industry white papers that focused on cybersecurity and health care challenges and solutions were included. Only articles published in English were selected for the review. RESULTS: In total, 5 themes were identified: human error, lack of investment, complex network-connected end-point devices, old legacy systems, and technology advancement (digitalization). We also found that knowledge applications for solving vulnerabilities in health care systems between 2012 to 2022 were inconsistent. CONCLUSIONS: This SLR provides a clear understanding of why health care systems are vulnerable to cyberattacks and proposes interventions from a new sociotechnical perspective. These solutions can serve as a guide for health care organizations in their efforts to prevent breaches and address vulnerabilities. To bridge the gap, we recommend that health care organizations, in partnership with educational institutions, develop and implement a cybersecurity curriculum for health care and intelligence information sharing through collaborations; training; awareness campaigns; and knowledge application areas such as secure design processes, phase-out of legacy systems, and improved investment. Additional studies are needed to create a sociotechnical framework that will support cybersecurity in health care systems and connect technology, people, and processes in an integrated manner.


Asunto(s)
Seguridad Computacional , Humanos , Atención a la Salud , Seguridad del Paciente
2.
Fertil Steril ; 121(3): 428-433, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38048902

RESUMEN

Clinicians should encourage disclosure between intimate partners but should maintain confidentiality in cases where there is no prospect of harm to the partner and/or offspring. In cases where a member of a couple refuses to disclose relevant health information to the other partner and there exists a risk of harm to the unaware partner and/or offspring, clinicians may refuse to offer care and should decline to treat if full informed consent is not possible because of the lack of disclosure. This document replaces the previously published document of the same name, last published in 2018.


Asunto(s)
Conducta Sexual , Parejas Sexuales , Humanos , Fertilidad , Consentimiento Informado , Revelación , Comités de Ética
3.
J Clin Monit Comput ; 37(5): 1123-1132, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37088852

RESUMEN

Cybersecurity has seen an increasing frequency and impact of cyberattacks and exposure of Protected Health Information (PHI). The uptake of an Electronic Medical Record (EMR), the exponential adoption of Internet of Things (IoT) devices, and the impact of the COVID-19 pandemic has increased the threat surface presented for cyberattack by the healthcare sector. Within healthcare generally and, more specifically, within anaesthesia and Intensive Care, there has been an explosion in wired and wireless devices used daily in the care of almost every patient-the Internet of Medical Things (IoMT); ventilators, anaesthetic machines, infusion pumps, pacing devices, organ support and a plethora of monitoring modalities. All of these devices, once connected to a hospital network, present another opportunity for a malevolent party to access the hospital systems, either to gain PHI for financial, political or other gain or to attack the systems directly to cause erroneous monitoring, altered settings of any device and even to access the EMR via this IoMT window. This exponential increase in the IoMT and the increasing wireless connectivity of anaesthesia and ICU devices as well as implantable devices presents a real and present danger to patient safety. There has, at the same time, been a chronic underfunding of cybersecurity in healthcare. This lack of cybersecurity investment has left the sector exposed, and with the monetisation of PHI, the introduction of technically unsecure IoT devices for monitoring and direct patient care, the healthcare sector is presenting itself for further devastating cyberattacks or breaches of PHI. Coupled with the immense strain that the COVID-19 pandemic has placed on healthcare and the changes in working patterns of many caregivers, this has further amplified the exposure of the sector to cyberattacks.


Asunto(s)
COVID-19 , Humanos , Pandemias , Atención a la Salud , Hospitales , Seguridad Computacional
4.
Inf Technol Manag ; 24(2): 177-193, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36285184

RESUMEN

This paper aims to identify and understand factors affecting insiders' intention to disclose patients' medical information and to investigate how these factors affect the intention to disclose. Based on the literature review on deterrence theory and health information security awareness (HISA), we identify relevant factors and develop a research model explaining insiders' intention to disclose patients' health information. We collect data (N = 105) through scenario-based experiments. Results show that two personal factors, collectivism, and IT proficiency, play a significant role in the model. While collectivism affects two components (health information security regulation awareness and punishment severity awareness) of HISA which influences intention to disclose, IT proficiency moderates the relationship between HISA and intention to disclose. In addition, HISA negatively affects reporting assessment and intention to disclose. This paper aims to fill a research gap in understanding factors affecting insiders' intentions to disclose protected health information. We identify and investigate factors (e.g., collectivism, HISA, reporting assessment, and IT proficiency) that may affect insiders' disclosing intentions. We find that collectivism affects two components of HISA which influence reporting assessment and disclosing intention. We also discover that IT proficiency moderates the relationship between HISA and intention to disclose. Our findings suggest that we need to carefully consider personal factors such as collectivistic nature and IT proficiency in managing insiders' security breaches.

5.
AORN J ; 117(1): 52-60, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36573752

RESUMEN

Patient information management can involve paper and electronic documentation. Because the patient's health care record (HCR) is a legal document, it must provide an accurate representation of care. The record contains protected health information and must be secure. In addition, the documentation must adhere to local, state, and federal regulations and facility policies; it also may incorporate recommendations from national professional guidelines. The AORN "Guideline for patient information management" was recently updated and provides evidence-based best practices for comprehensive perioperative documentation that aligns with the nursing workflow. This article includes an overview of patient information management and discusses recommendations for health information technology, the patient HCR, perioperative record design, documentation and nursing workflow, informed consent documentation, order documentation, modifying patient HCRs, education, policies and procedures, and quality. Perioperative nurses should review the guideline in its entirety and apply the recommendations for patient information management as applicable to their individual roles.


Asunto(s)
Gestión de la Información , Atención de Enfermería , Humanos , Documentación , Registros Electrónicos de Salud , Guías de Práctica Clínica como Asunto
6.
Clin Transl Radiat Oncol ; 38: 161-168, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36466748

RESUMEN

Purpose/Objective: Magnetic resonance-guided radiation therapy (MRgRT) utilization is rapidly expanding worldwide, driven by advanced capabilities including continuous intrafraction visualization, automatic triggered beam delivery, and on-table adaptive replanning (oART). Our objective was to describe patterns of 0.35Tesla(T)-MRgRT (MRIdian) utilization in the United States (US) among early adopters of this novel technology. Materials/Methods: Anonymized administrative data from all US MRIdian treatment systems were extracted for patients completing treatment from 2014 to 2020. Detailed treatment information was available for all MRIdian linear accelerator (linac) systems and some cobalt systems. Results: Seventeen systems at 16 centers delivered 5736 courses and 36,389 fractions (fraction details unavailable for 1223 cobalt courses), of which 21.1% were adapted. Ultra-hypofractionation (UHfx) (1-5 fractions) was used in 70.3% of all courses. At least one adaptive fraction was used for 38.5% of courses (average 1.7 adapted fractions/course), with higher oART use in UHfx dose schedules (47.7% of courses, average 1.9 adapted fractions per course). The most commonly treated organ sites were pancreas (20.7%), liver (16.5%), prostate (12.5%), breast (11.5%), and lung (9.4%). Temporal trends show a compounded annual growth rate (CAGR) of 59.6% in treatment courses delivered, with a dramatic increase in use of UHfx to 84.9% of courses in 2020 and similar increase in use of oART to 51.0% of courses. Conclusions: This is the first comprehensive study reporting patterns of utilization among early adopters of MRIdian in the US. Intrafraction MR image-guidance, advanced motion management, and increasing adoption of adaptive radiation therapy has led to a substantial transition to ultra-hypofractionated regimens. 0.35 T-MRgRT has been predominantly used to treat abdominal and pelvic tumors with increasing use of on-table adaptive replanning, which represents a paradigm shift in radiation therapy.

8.
AMIA Annu Symp Proc ; 2023: 814-823, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222389

RESUMEN

In the era of big data, there is an increasing need for healthcare providers, communities, and researchers to share data and collaborate to improve health outcomes, generate valuable insights, and advance research. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law designed to protect sensitive health information by defining regulations for protected health information (PHI). However, it does not provide efficient tools for detecting or removing PHI before data sharing. One of the challenges in this area of research is the heterogeneous nature of PHI fields in data across different parties. This variability makes rule-based sensitive variable identification systems that work on one database fail on another. To address this issue, our paper explores the use of machine learning algorithms to identify sensitive variables in structured data, thus facilitating the de-identification process. We made a key observation that the distributions of metadata of PHI fields and non-PHI fields are very different. Based on this novel finding, we engineered over 30 features from the metadata of the original features and used machine learning to build classification models to automatically identify PHI fields in structured Electronic Health Record (EHR) data. We trained the model on a variety of large EHR databases from different data sources and found that our algorithm achieves 99% accuracy when detecting PHI-related fields for unseen datasets. The implications of our study are significant and can benefit industries that handle sensitive data.


Asunto(s)
Confidencialidad , Sistemas de Registros Médicos Computarizados , Estados Unidos , Humanos , Health Insurance Portability and Accountability Act , Algoritmos , Aprendizaje Automático , Registros Electrónicos de Salud
9.
Cureus ; 14(10): e30168, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36397924

RESUMEN

The use of electronic health records (EHRs) has grown significantly in the past decade. Health information databases contain sensitive patient information, including their names and addresses, tests, diagnoses, treatment, and medical history. This information should be secured and protected from manipulation and fraudulent use by third parties. EHRs are expected to increase efficiency in healthcare delivery, improve healthcare quality, and relieve increased financial pressure. Despite these expected benefits, EHRs are potentially vulnerable to security concerns that may affect the confidentiality and privacy of patients' personal information. This paper presents a literature review of EHRs, factors that support the security and safety of health records, potential security breaches, and solutions to inherent security concerns. The study collects data through a systematic review of past studies that have addressed the topic of EHRs and security issues, and other relevant publications on EHR systems, and procedures that help safeguard health records databases. A total of 30 sources are analyzed for all pertinent information regarding security concerns of health records databases. These sources were obtained through an internet search on credible databases, including Google Scholar, PubMed, and CINAHL databases. The results of the current study reveal the perceived vulnerability of EHRs to security concerns, common security issues, the nature of these common security concerns, Health Insurance Portability and Accountability Act rules, provider responsibilities, and recommendations for reducing EHR security risks. This paper also reveals effective strategies such as privacy-protection awareness and staff training to enhance the security of health records databases.

10.
J Med Syst ; 46(12): 85, 2022 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-36261623

RESUMEN

Patient Electronic Health Records (EHRs) contain valuable clinical data that is useful for medical research and public health inquires. However, patient privacy regulation and improper resource sharing risks limit access to EHR medical data for research and public health purposes. In this paper, we introduce an end-to-end security solution that addresses both concerns and facilitates the sharing of patient EHR data over an unsecured third-party server using a leveled homomorphic encryption (LHE) scheme. Time testing for aggregating queries and linear computations was carried out using an HPE ProLiant DL580 Gen 10 server with an Intel Xeon Platinum 8280 Processor.


Asunto(s)
Seguridad Computacional , Registros Electrónicos de Salud , Humanos , Privacidad , Platino (Metal) , Confidencialidad
11.
Adv Chronic Kidney Dis ; 29(5): 427-430, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36253025

RESUMEN

Detecting protected health information in electronic health record systems is often an early step in health care analytics, and it is a nontrivial problem. Specific challenges include finding clinician names and diseases, which lack a fixed format and are often context-dependent. The general problem of finding entities, termed named-entity recognition, has received a substantial amount of attention in the natural language processing and deep learning communities. This paper begins by outlining recent methods for finding protected health information, and it then introduces a hybrid system which combines regular expressions with a natural language processing framework called FLAIR. FLAIR is open-source, it includes state-of-the-art deep learning models, and it supports straightforward development of new models for language tasks including named-entity recognition. Finally, there is a discussion of how to apply the system to structured text in a database table as well as unstructured text in clinical notes.


Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Humanos , Lenguaje , Procesamiento de Lenguaje Natural
12.
J Digit Imaging ; 35(6): 1694-1698, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35715655

RESUMEN

Natural language processing (NLP) techniques for electronic health records have shown great potential to improve the quality of medical care. The text of radiology reports frequently constitutes a large fraction of EHR data, and can provide valuable information about patients' diagnoses, medical history, and imaging findings. The lack of a major public repository for radiological reports severely limits the development, testing, and application of new NLP tools. De-identification of protected health information (PHI) presents a major challenge to building such repositories, as many automated tools for de-identification were trained or designed for clinical notes and do not perform sufficiently well to build a public database of radiology reports. We developed and evaluated six ensemble models based on three publically available de-identification tools: MIT de-id, NeuroNER, and Philter. A set of 1023 reports was set aside as the testing partition. Two individuals with medical training annotated the test set for PHI; differences were resolved by consensus. Ensemble methods included simple voting schemes (1-Vote, 2-Votes, and 3-Votes), a decision tree, a naïve Bayesian classifier, and Adaboost boosting. The 1-Vote ensemble achieved recall of 998 / 1043 (95.7%); the 3-Votes ensemble had precision of 1035 / 1043 (99.2%). F1 scores were: 93.4% for the decision tree, 71.2% for the naïve Bayesian classifier, and 87.5% for the boosting method. Basic voting algorithms and machine learning classifiers incorporating the predictions of multiple tools can outperform each tool acting alone in de-identifying radiology reports. Ensemble methods hold substantial potential to improve automated de-identification tools for radiology reports to make such reports more available for research use to improve patient care and outcomes.


Asunto(s)
Procesamiento de Lenguaje Natural , Radiología , Humanos , Teorema de Bayes , Registros Electrónicos de Salud , Aprendizaje Automático
13.
J Bone Oncol ; 34: 100423, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35378840

RESUMEN

Background: Clinical practice guidelines recommend the use of bone-targeting agents for preventing skeletal-related events (SREs) among patients with bone metastases from solid tumors. The anti-RANKL monoclonal antibody denosumab is approved for the prevention of SREs in patients with bone metastases from solid tumors. However, real-world data are lacking on the impact of individual risk factors for SREs, specifically in the context of denosumab discontinuation. Purpose: We aim to identify risk factors associated with SRE incidence following denosumab discontinuation using a machine learning approach to help profile patients at a higher risk of developing SREs following discontinuation of denosumab treatment. Methods: Using the Optum PanTher Electronic Health Record repository, patients diagnosed with incident bone metastases from primary solid tumors between January 1, 2007, and September 1, 2019, were evaluated for inclusion in the study. Eligible patients received ≥ 2 consecutive 120 mg denosumab doses on a 4-week (± 14 days) schedule with a minimum follow-up of ≥ 1 year after the last denosumab dose, or an SRE occurring between days 84 and 365 after denosumab discontinuation. Extreme gradient boosting was used to develop an SRE risk prediction model evaluated on a test dataset. Multiple variables associated with patient demographics, comorbidities, laboratory values, treatments, and denosumab exposures were examined as potential factors for SRE risk using Shapley Additive Explanations (SHAP). Univariate analyses on risk factors with the highest importance from pooled and tumor-specific models were also conducted. Results: A total of 1,414 adult cancer patients (breast: 40%, prostate: 30%, lung: 13%, other: 17%) were eligible, of whom 1,133 (80%) were assigned to model training and 281 (20%) to model evaluation. The median age at inclusion was 67 (range, 19-89) years with a median duration of denosumab treatment of 253 (range, 88-2,726) days; 490 (35%) patients experienced ≥ 1 SRE 83 days after denosumab discontinuation. Meaningful model performance was evaluated by an area under the receiver operating curve score of 77% and an F1 score of 62%; model precision was 60%, with 63% sensitivity and 78% specificity. SHAP identified several significant factors for the tumor-agnostic and tumor-specific models that predicted an increased SRE risk following denosumab discontinuation, including prior SREs, shorter denosumab treatment duration, ≥ 4 clinic visits per month with at least one hospitalization (all-cause) event from the baseline period up to discontinuation of denosumab, younger age at bone metastasis, shorter time to denosumab initiation from bone metastasis, and prostate cancer. Conclusion: This analysis showed a higher cumulative number of SREs, prior SREs relative to denosumab initiation, a higher number of hospital visits, and a shorter denosumab treatment duration as significant factors that are associated with an increased SRE risk after discontinuation of denosumab, in both the tumor-agnostic and tumor-specific models. Our machine learning approach to SRE risk factor identification reinforces treatment guidance on the persistent use of denosumab and has the potential to help clinicians better assess a patient's need to continue denosumab treatment and improve patient outcomes.

14.
Am J Obstet Gynecol ; 227(1): 87.e1-87.e13, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35351406

RESUMEN

BACKGROUND: Laboratories offering cell-free DNA often reserve the right to share prenatal genetic data for research or even commercial purposes, and obtain this permission on the patient consent form. Although it is known that nonpregnant patients are often reluctant to share their genetic data for research, pregnant patients' knowledge of, and opinions about, genetic data privacy are unknown. OBJECTIVE: We investigated whether pregnant patients who had already undergone cell-free DNA screening were aware that genetic data derived from cell-free DNA may be shared for research. Furthermore, we examined whether pregnant patients exposed to video education about the Genetic Information Nondiscrimination Act-a federal law that mandates workplace and health insurance protections against genetic discrimination-were more willing to share cell-free DNA-related genetic data for research than pregnant patients who were unexposed. STUDY DESIGN: In this randomized controlled trial (ClinicalTrials.gov Identifier: NCT04420858), English-speaking patients with singleton pregnancies who underwent cell-free DNA and subsequently presented at 17 0/7 to 23 6/7 weeks of gestation for a detailed anatomy scan were randomized 1:1 to a control or intervention group. Both groups viewed an infographic about cell-free DNA. In addition, the intervention group viewed an educational video about the Genetic Information Nondiscrimination Act. The primary outcomes were knowledge about, and willingness to share, prenatal genetic data from cell-free DNA by commercial laboratories for nonclinical purposes, such as research. The secondary outcomes included knowledge about existing genetic privacy laws, knowledge about the potential for reidentification of anonymized genetic data, and acceptability of various use and sharing scenarios for prenatal genetic data. Eighty-one participants per group were required for 80% power to detect an increase in willingness to share data from 60% to 80% (α=0.05). RESULTS: A total of 747 pregnant patients were screened, and 213 patients were deemed eligible and approached for potential study participation. Of these patients, 163 (76.5%) consented and were randomized; one participant discontinued the intervention, and two participants were excluded from analysis after the intervention when it was discovered that they did not fulfill all eligibility criteria. Overall, 160 (75.1%) of those approached were included in the final analysis. Most patients in the control group (72 [90.0%]) and intervention (76 [97.4%]) group were either unsure about or incorrectly thought that cell-free DNA companies could not share prenatal genetic data for research. Participants in the intervention group were more likely to incorrectly believe that their prenatal genetic data would not be shared for nonclinical purposes than participants in the control group (28.8% in the control group vs 46.2% in the intervention; P=.03). However, video education did not increase participant willingness to share genetic data in multiple scenarios. Non-White participants were less willing than White participants to allow sharing of genetic data specifically for academic research (P<.001). CONCLUSION: Most participants were unaware that their prenatal genetic data may be used for nonclinical purposes. Pregnant patients who were educated about the Genetic Information Nondiscrimination Act were not more willing to share genetic data than those who did not receive this education. Surprisingly, video education about the Genetic Information Nondiscrimination Act led patients to falsely believe that their data would not be shared for research, and participants who identified as racial minorities were less willing to share genetic data. New strategies are needed to improve pregnant patients' understanding of genetic privacy.


Asunto(s)
Recursos Audiovisuales , Ácidos Nucleicos Libres de Células , Privacidad Genética , Educación del Paciente como Asunto , Femenino , Humanos , Embarazo
15.
Front Digit Health ; 4: 728922, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35252956

RESUMEN

BACKGROUND: Electronic health record (EHR) systems contain a large volume of texts, including visit notes, discharge summaries, and various reports. To protect the confidentiality of patients, these records often need to be fully de-identified before circulating for secondary use. Machine learning (ML) based named entity recognition (NER) model has emerged as a popular technique of automatic de-identification. OBJECTIVE: The performance of a machine learning model highly depends on the selection of appropriate features. The objective of this study was to investigate the usability of multiple features in building a conditional random field (CRF) based clinical de-identification NER model. METHODS: Using open-source natural language processing (NLP) toolkits, we annotated protected health information (PHI) in 1,500 pathology reports and built supervised NER models using multiple features and their combinations. We further investigated the dependency of a model's performance on the size of training data. RESULTS: Among the 10 feature extractors explored in this study, n-gram, prefix-suffix, word embedding, and word shape performed the best. A model using combination of these four feature sets yielded precision, recall, and F1-score for each PHI as follows: NAME (0.80; 0.79; 0.80), LOCATION (0.85; 0.83; 0.84), DATE (0.86; 0.79; 0.82), HOSPITAL (0.96; 0.93; 0.95), ID (0.99; 0.82; 0.90), and INITIALS (0.97; 0.49; 0.65). We also found that the model's performance becomes saturated when the training data size is beyond 200. CONCLUSION: Manual de-identification of large-scale data is an impractical procedure since it is time-consuming and subject to human errors. Analysis of the NER model's performance in this study sheds light on a semi-automatic clinical de-identification pipeline for enterprise-wide data warehousing.

16.
J World Fed Orthod ; 11(3): 90-92, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35193830

RESUMEN

Cell phones are used by almost everyone and have become an integral part of our daily life. They are an almost universal instrument for gathering and transmitting information. The amount of bacteria crawling on a typical cell phone has been a point of contention, although studies show that they contain at least 10 times the amount of bacteria found on most toilet seats. As medical workers in hospital and clinic settings, we use our cell phones extensively, for paging, texting, calling, and recreational activities; thus, the risk of contamination by pathogens is a legitimate issue. Cell phones in the operating room may give patients the impression that they are not the main priority or the center of care, in addition to being a potential source of infection. Cell phones also have been found to be the number-one productivity killer in workplaces, amongst a long list of other distractions. All workers must be fully aware of their Health Insurance Portability and Accountability Act (HIPAA) compliance responsibilities and obligations, including the protection of health information, while using their cell phones at workplaces. On clinic or hospital grounds, reasonable diligence and strict adherence to cell phone policies may help us maintain greater safety, productivity, and professionalism, resulting in better service for our patients.


Asunto(s)
Teléfono Celular , Envío de Mensajes de Texto , Health Insurance Portability and Accountability Act , Personal de Salud , Humanos , Quirófanos , Estados Unidos
17.
J Biomed Inform ; 125: 103971, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34920127

RESUMEN

OBJECTIVE: Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings. MATERIALS AND METHODS: We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the English Wikipedia corpus, the Medical Information Mart for Intensive Care (MIMIC) III dataset, and all notes in the University of Pennsylvania Health System (UPHS) electronic health record. We tested embeddings in six clinically relevant tasks including mortality prediction and de-identification, and assessed performance using the scaled Brier score (SBS) and the proportion of notes successfully de-identified, respectively. RESULTS: Embeddings from UPHS notes best predicted mortality (SBS 0.30, 95% CI 0.15 to 0.45) while Wikipedia embeddings performed worst (SBS 0.12, 95% CI -0.05 to 0.28). Wikipedia embeddings most consistently (78% of notes) and the full PMC corpus embeddings least consistently (48%) de-identified notes. Across all six tasks, the full PMC corpus demonstrated the most consistent performance, and the Wikipedia corpus the least. Corpus size ranged from 49 million tokens (PMC case reports) to 10 billion (UPHS). DISCUSSION: Embeddings trained on published case reports performed as least as well as embeddings trained on other corpora in most tasks, and clinical corpora consistently outperformed non-clinical corpora. No single corpus produced a strictly dominant set of embeddings across all tasks and so the optimal training corpus depends on intended use. CONCLUSION: Embeddings trained on published case reports performed comparably on most clinical tasks to embeddings trained on larger corpora. Open access corpora allow training of clinically relevant, effective, and reproducible embeddings.


Asunto(s)
Registros Electrónicos de Salud , Publicaciones , Humanos , Procesamiento de Lenguaje Natural , PubMed , Reproducibilidad de los Resultados
18.
J Med Internet Res ; 23(10): e30697, 2021 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-34559671

RESUMEN

BACKGROUND: Computationally derived ("synthetic") data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. OBJECTIVE: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. METHODS: We used the National COVID Cohort Collaborative's instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19-related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. RESULTS: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. CONCLUSIONS: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Análisis de Datos , Humanos , Pandemias , SARS-CoV-2
19.
J Digit Imaging ; 34(4): 986-1004, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34241789

RESUMEN

There are various efforts in de-identifying patient's radiation oncology data for their uses in the advancement of research in medicine. Though the task of de-identification needs to be defined in the context of research goals and objectives, existing systems lack the flexibility of modeling data and normalization of names of attributes for accomplishing them. In this work, we describe a de-identification process of radiation and clinical oncology data, which is guided by a data model and a schema of dynamically capturing domain ontology and normalization of terminologies, defined in tune with the research goals in this area. The radiological images are obtained in DICOM format. It consists of diagnostic, radiation therapy (RT) treatment planning, RT verification, and RT response images. During the DICOM de-identification, a few crucial pieces of information are taken about the dataset. The proposed model is generic in organizing information modeling in sync with the de-identification of a patient's clinical information. The treatment and clinical data are provided in the comma-separated values (CSV) format, which follows a predefined data structure. The de-identified data is harmonized throughout the entire process. We have presented four specific case studies on four different types of cancers, namely glioblastoma multiforme, head-neck, breast, and lung. We also present experimental validation on a few patients' data in these four areas. A few aspects are taken care of during de-identification, such as preservation of longitudinal date changes (LDC), incremental de-identification, referential data integrity between the clinical and image data, de-identified data harmonization, and transformation of the data to an underlined database schema.


Asunto(s)
Objetivos , Radiología , Bases de Datos Factuales , Humanos , Modelos Teóricos
20.
Stud Health Technol Inform ; 281: 273-277, 2021 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-34042748

RESUMEN

We describe the adaptation of a non-clinical pseudonymization system, originally developed for a German email corpus, for clinical use. This tool replaces previously identified Protected Health Information (PHI) items as carriers of privacy-sensitive information (original names for people, organizations, places, etc.) with semantic type-conformant, yet, fictitious surrogates. We evaluate the generated substitutes for grammatical correctness, semantic and medical plausibility and find particularly low numbers of error instances (less than 1%) on all of these dimensions.


Asunto(s)
Confidencialidad , Privacidad , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA