Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
1.
Stud Health Technol Inform ; 316: 530-531, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176795

RESUMEN

Thirteen standardized reasons for e-visits were implemented in March 2024 on a French telemedicine platform to improve the analysis of needs in telemedicine, educate patients on what is possible in e-visit and adapt the offer. Patients could select 1 to 3 reasons for consultations among a list of 13 reasons. Our aim was to evaluate their impact on use of e-visits. The main reasons for consultations in teleconsultations were more linked to acute care, specifically involving a large majority of upper respiratory tract infections, back pain, and urinary tract infections. They were mostly concordant with the physician's conclusion and they may have simplified the preparation of the e-visits.


Asunto(s)
Consulta Remota , Telemedicina , Francia , Humanos , Femenino , Masculino
2.
Stud Health Technol Inform ; 316: 1861-1865, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176854

RESUMEN

Using clinical decision support systems (CDSSs) for breast cancer management necessitates to extract relevant patient data from textual reports which is a complex task although efficiently achieved by machine learning but black box methods. We proposed a rule-based natural language processing (NLP) method to automate the translation of breast cancer patient summaries into structured patient profiles suitable for input into the guideline-based CDSS of the DESIREE project. Our method encompasses named entity recognition (NER), relation extraction and structured data extraction to systematically organize patient data. The method demonstrated strong alignment with treatment recommendations generated for manually created patient profiles (gold standard) with only 2% of differences. Moreover, the NER pipeline achieved an average F1-score of 0.9 across the main entities (patient, side, and tumor), of 0,87 for relation extraction, and 0.75 for contextual information, showing promising results for rule-based NLP.


Asunto(s)
Neoplasias de la Mama , Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Neoplasias de la Mama/terapia , Femenino , Minería de Datos/métodos , Aprendizaje Automático
3.
Eur J Radiol Open ; 13: 100582, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39041057

RESUMEN

Objective: Routinely collected electronic health records using artificial intelligence (AI)-based systems bring out enormous benefits for patients, healthcare centers, and its industries. Artificial intelligence models can be used to structure a wide variety of unstructured data. Methods: We present a semi-automatic workflow for medical dataset management, including data structuring, research extraction, AI-ground truth creation, and updates. The algorithm creates directories based on keywords in new file names. Results: Our work focuses on organizing computed tomography (CT), magnetic resonance (MR) images, patient clinical data, and segmented annotations. In addition, an AI model is used to generate different initial labels that can be edited manually to create ground truth labels. The manually verified ground truth labels are later included in the structured dataset using an automated algorithm for future research. Conclusion: This is a workflow with an AI model trained on local hospital medical data with output based/adapted to the users and their preferences. The automated algorithms and AI model could be implemented inside a secondary secure environment in the hospital to produce inferences.

4.
Molecules ; 29(13)2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38999091

RESUMEN

In the organic laboratory, the 13C nuclear magnetic resonance (NMR) spectrum of a newly synthesized compound remains an essential step in elucidating its structure. For the chemist, the interpretation of such a spectrum, which is a set of chemical-shift values, is made easier if he/she has a tool capable of predicting with sufficient accuracy the carbon-shift values from the structure he/she intends to prepare. As there are few open-source methods for accurately estimating this property, we applied our graph-machine approach to build models capable of predicting the chemical shifts of carbons. For this study, we focused on benzene compounds, building an optimized model derived from training a database of 10,577 chemical shifts originating from 2026 structures that contain up to ten types of non-carbon atoms, namely H, O, N, S, P, Si, and halogens. It provides a training root-mean-squared relative error (RMSRE) of 0.5%, i.e., a root-mean-squared error (RMSE) of 0.6 ppm, and a mean absolute error (MAE) of 0.4 ppm for estimating the chemical shifts of the 10k carbons. The predictive capability of the graph-machine model is also compared with that of three commercial packages on a dataset of 171 original benzenic structures (1012 chemical shifts). The graph-machine model proves to be very efficient in predicting chemical shifts, with an RMSE of 0.9 ppm, and compares favorably with the RMSEs of 3.4, 1.8, and 1.9 ppm computed with the ChemDraw v. 23.1.1.3, ACD v. 11.01, and MestReNova v. 15.0.1-35756 packages respectively. Finally, a Docker-based tool is proposed to predict the carbon chemical shifts of benzenic compounds solely from their SMILES codes.

5.
Med Ref Serv Q ; 43(2): 196-202, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38722609

RESUMEN

Named entity recognition (NER) is a powerful computer system that utilizes various computing strategies to extract information from raw text input, since the early 1990s. With rapid advancement in AI and computing, NER models have gained significant attention and been serving as foundational tools across numerus professional domains to organize unstructured data for research and practical applications. This is particularly evident in the medical and healthcare fields, where NER models are essential in efficiently extract critical information from complex documents that are challenging for manual review. Despite its successes, NER present limitations in fully comprehending natural language nuances. However, the development of more advanced and user-friendly models promises to improve work experiences of professional users significantly.


Asunto(s)
Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Almacenamiento y Recuperación de la Información/métodos , Humanos , Inteligencia Artificial
6.
Cancer ; 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38662502

RESUMEN

INTRODUCTION: Structured data capture requires defined languages such as minimal Common Oncology Data Elements (mCODE). This pilot assessed the feasibility of capturing 5 mCODE categories (stage, disease status, performance status (PS), intent of therapy and intent to change therapy). METHODS: A tool (SmartPhrase) using existing and custom structured data elements was Built to capture 4 data categories (disease status, PS, intent of therapy and intent to change therapy) typically documented as free-text within notes. Existing functionality for stage was supported by the Build. Participant survey data, presence of data (per encounter), and time in chart were collected prior to go-live and repeat timepoints. The anticipated outcome was capture of >50% sustained over time without undue burden. RESULTS: Pre-intervention (5-weeks before go-live), participants had 1390 encounters (1207 patients). The median percent capture across all participants was 32% for stage; no structured data was available for other categories pre-intervention. During a 6-month pilot with 14 participants across three sites, 4995 encounters (3071 patients) occurred. The median percent capture across all participants and all post-intervention months increased to 64% for stage and 81%-82% for the other data categories post-intervention. No increase in participant time in chart was noted. Participants reported that data were meaningful to capture. CONCLUSIONS: Structured data can be captured (1) in real-time, (2) sustained over time without (3) undue provider burden using note-based tools. Our system is expanding the pilot, with integration of these data into clinical decision support, practice dashboards and potential for clinical trial matching.

7.
Int J Pharm X ; 7: 100239, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38545329

RESUMEN

A network of regulatory innovations brings a holistic approach to improving the submission, assessment, and lifecycle management of pharmaceutical quality information in the U.S. This dedicated effort in the FDA's Center for Drug Evaluation and Research (CDER) aims to enhance the quality assessment of submissions for new drugs, generic drugs, and biological products including biosimilars. These regulatory innovations include developing or contributing: (i) the Knowledge-Aided Assessment and Structured Application (KASA), (ii) a new common technical document for quality (ICH M4Q(R2)), (iii) structured data on Pharmaceutical Quality/Chemistry, Manufacturing and Controls (PQ/CMC), (iv) Integrated Quality Assessment (IQA), (v) the Quality Surveillance Dashboard (QSD), and (vi) the Established Conditions tool from the ICH Q12 guideline. The innovations collectively drive CDER toward a more coordinated, effective, and efficient quality assessment. Improvements are made possible by structured regulatory submissions, a systems approach to quality risk management, and data-driven decisions based on science, risk, and effective knowledge management. The intended result is better availability of quality medicines for U.S. patients.

8.
Int J Med Inform ; 184: 105344, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38310755

RESUMEN

INTRODUCTION: Theoretically, the added value of electronic health records (EHRs) is extensive. Reusable data capture in EHRs could lead to major improvements in quality measurement, scientific research, and decision support. To achieve these goals, structured and standardized recording of healthcare data is a prerequisite. However, time spent on EHRs by physicians is already high. This study evaluated the effect of implementing an EHR embedded care pathway with structured data recording on the EHR burden of physicians. MATERIALS AND METHODS: Before and six months after implementation, consultations were recorded and analyzed with video-analytic software. Main outcome measures were time spent on specific tasks within the EHR, total consultation duration, and usability indicators such as required mouse clicks and keystrokes. Additionally, a validated questionnaire was completed twice to evaluate changes in physician perception of EHR system factors and documentation process factors. RESULTS: Total EHR time in initial oncology consultations was significantly reduced by 3.7 min, a 27 % decrease. In contrast, although a decrease of 13 % in consultation duration was observed, no significant effect on EHR time was found in follow-up consultations. Additionally, perceptions of physicians regarding the EHR and documentation improved significantly. DISCUSSION: Our results have shown that it is possible to achieve structured data capture while simultaneously reducing the EHR burden, which is a decisive factor in end-user acceptance of documentation systems. Proper alignment of structured documentation with workflows is critical for success. CONCLUSION: Implementing an EHR embedded care pathway with structured documentation led to decreased EHR burden.


Asunto(s)
Registros Electrónicos de Salud , Médicos , Humanos , Vías Clínicas , Derivación y Consulta , Programas Informáticos , Documentación/métodos
9.
Stud Health Technol Inform ; 310: 1464-1465, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269698

RESUMEN

The era of the electronic health record (EHR) requires lots of semantic interoperability for data sharing and reusability. We select HL7 v2 messages as the most common structured data type in hospital information systems, to investigate the plausibility of using Elasticsearch (ES) as a healthcare search engine and data analytics tool. Due to the facts, Elasticsearch can be integrated as a powerful searchable database for practical healthcare applications, to analyze structured healthcare data from various locations. It allows easy and efficient searching for complex query tasks.


Asunto(s)
Ciencia de los Datos , Sistemas de Información en Hospital , Bases de Datos Factuales , Registros Electrónicos de Salud , Instituciones de Salud
10.
Neuro Oncol ; 26(6): 1163-1170, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38141226

RESUMEN

BACKGROUND: Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR). METHODS: We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords. RESULTS: We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule. CONCLUSIONS: We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors.


Asunto(s)
Neoplasias Encefálicas , Registros Electrónicos de Salud , Glioblastoma , Fenotipo , Humanos , Glioblastoma/patología , Glioblastoma/diagnóstico , Neoplasias Encefálicas/patología , Neoplasias Encefálicas/diagnóstico , Algoritmos , Femenino
11.
Rev. esp. quimioter ; 36(6): 592-596, dec. 2023. ilus, tab
Artículo en Inglés | IBECS | ID: ibc-228245

RESUMEN

Objectives. Clinical data on which artificial intelligence (AI) algorithms are trained and tested provide the basis to im prove diagnosis or treatment of infectious diseases (ID). We aimed to identify important data for ID research to prioritise efforts being undertaken in AI programmes. Material and methods. We searched for 1,000 articles from high-impact ID journals on PubMed, selecting 288 of the latest articles from 10 top journals. We classified them into structured or unstructured data. Variables were homogenised and grouped into the following categories: epidemiology, ad mission, demographics, comorbidities, clinical manifestations, laboratory, microbiology, other diagnoses, treatment, out comes and other non-categorizable variables. Results. 4,488 individual variables were collected, from the 288 articles. 3,670 (81.8%) variables were classified as structured data whilst 818 (18.2%) as unstructured data. From the structured data, 2,319 (63.2%) variables were classified as direct—retrievable from electronic health records—whilst 1,351 (36.8%) were indirect. The most frequent unstructured data were related to clinical manifestations and were repeated across articles. Data on demographics, comorbidities and mi crobiology constituted the most frequent group of variables. Conclusions. This article identified that structured vari ables have comprised the most important data in research to generate knowledge in the field of ID. Extracting these data should be a priority when a medical centre intends to start an AI programme for ID. We also documented that the most important unstructured data in this field are those related to clinical manifestations. Such data could easily undergo some structuring with the use of semi-structured medical records focusing on a few symptoms (AU)


Objetivos. Los datos clínicos sobre los que se entrenan y prueban los algoritmos de inteligencia artificial (IA) proporcio nan la base para mejorar el diagnóstico o el tratamiento de las enfermedades infecciosas (EI). Nuestro objetivo es identificar datos importantes para la investigación de las enfermedades infecciosas con el fin de priorizar los esfuerzos realizados en los programas de IA. Material y métodos. Se buscaron 1.000 artículos de re vistas de EI de alto impacto en PubMed, seleccionando 288 de los últimos artículos en 10 revistas de primer nivel. Los clasifi camos en datos estructurados o no estructurados. Las variables se homogeneizaron y agruparon en las siguientes categorías: epidemiología, ingreso, demografía, comorbilidades, manifes taciones clínicas, laboratorio, microbiología, otros diagnósticos, tratamiento, desenlace y otras variables no categorizables. Resultados. Se recogieron 4.488 variables individuales, pro cedentes de 288 artículos. 3670 (81,8%) variables se clasificaron como datos estructurados, mientras que 818 (18,2%) como da tos no estructurados. De los datos estructurados, 2.319 (63,2%) variables se clasificaron como directas -recuperables a partir de historias clínicas electrónicas-, mientras que 1.351 (36,8%) fueron indirectas. Los datos no estructurados más frecuentes estaban re lacionados con las manifestaciones clínicas y se repetían en todos los artículos. Los datos sobre demografía, comorbilidades y micro biología constituyeron el grupo más frecuente de variables Conclusiones. Este artículo identificó que las variables es tructuradas han constituido los datos más importantes en la investigación para generar conocimiento en el campo de la EI. La extracción de estos datos debería ser una prioridad cuando un centro médico pretende iniciar un programa de IA para la EI (AU)


Asunto(s)
Humanos , Inteligencia Artificial , Enfermedades Transmisibles/diagnóstico , Enfermedades Transmisibles/terapia , Procesamiento de Lenguaje Natural
12.
Ann Appl Stat ; 17(4): 2944-2969, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38149262

RESUMEN

Motivated by emerging applications in ecology, microbiology, and neuroscience, this paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage auxiliary information on row and column structures. GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for incorporating relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse, but constrains the coordinate system representing the regression coefficients according to the column structure. GMDI also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI in simulation studies and an application to human microbiome data.

13.
Molecules ; 28(19)2023 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-37836648

RESUMEN

The refractive index (RI) of liquids is a key physical property of molecular compounds and materials. In addition to its ubiquitous role in physics, it is also exploited to impart specific optical properties (transparency, opacity, and gloss) to materials and various end-use products. Since few methods exist to accurately estimate this property, we have designed a graph machine model (GMM) capable of predicting the RI of liquid organic compounds containing up to 16 different types of atoms and effective in discriminating between stereoisomers. Using 8267 carefully checked RI values from the literature and the corresponding 2D organic structures, the GMM provides a training root mean square relative error of less than 0.5%, i.e., an RMSE of 0.004 for the estimation of the refractive index of the 8267 compounds. The GMM predictive ability is also compared to that obtained by several fragment-based approaches. Finally, a Docker-based tool is proposed to predict the RI of organic compounds solely from their SMILES code. The GMM developed is easy to apply, as shown by the video tutorials provided on YouTube.

14.
Health Informatics J ; 29(3): 14604582231200300, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37677012

RESUMEN

Objective: To evaluate how and from where social risk data are extracted from EHRs for research purposes, and how observed differences may impact study generalizability. Methods: Systematic scoping review of peer-reviewed literature that used patient-level EHR data to assess 1 ± 6 social risk domains: housing, transportation, food, utilities, safety, social support/isolation. Results: 111/9022 identified articles met inclusion criteria. By domain, social support/isolation was most often included (N = 68/111), predominantly defined by marital/partner status (N = 48/68) and extracted from structured sociodemographic data (N = 45/48). Housing risk was defined primarily by homelessness (N = 39/49). Structured housing data was extracted most from billing codes and screening tools (N = 15/30, 13/30, respectively). Across domains, data were predominantly sourced from structured fields (N = 89/111) versus unstructured free text (N = 32/111). Conclusion: We identified wide variability in how social domains are defined and extracted from EHRs for research. More consistency, particularly in how domains are operationalized, would enable greater insights across studies.


Asunto(s)
Registros Electrónicos de Salud , Apoyo Social , Humanos
15.
Rev Esp Quimioter ; 36(6): 592-596, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37575020

RESUMEN

OBJECTIVE: Clinical data on which artificial intelligence (AI) algorithms are trained and tested provide the basis to improve diagnosis or treatment of infectious diseases (ID). We aimed to identify important data for ID research to prioritise efforts being undertaken in AI programmes. METHODS: We searched for 1,000 articlesfrom high-impact ID journals on PubMed, selecting 288 of the latest articles from 10 top journals. We classified them into structured or unstructured data. Variables were homogenised and grouped into the following categories: epidemiology, admission, demographics, comorbidities, clinical manifestations, laboratory, microbiology, other diagnoses, treatment, outcomes and other non-categorizable variables. RESULTS: 4,488 individual variables were collected, from the 288 articles. 3,670 (81.8%) variables were classified as structured data whilst 818 (18.2%) as unstructured data. From the structured data, 2,319 (63.2%) variables were classified as direct-retrievable from electronic health records-whilst 1,351 (36.8%) were indirect. The most frequent unstructured data were related to clinical manifestations and were repeated across articles. Data on demographics, comorbidities and microbiology constituted the most frequent group of variables. CONCLUSIONS: This article identified that structured variables have comprised the most important data in research to generate knowledge in the field of ID. Extracting these data should be a priority when a medical centre intends to start an AI programme for ID. We also documented that the most important unstructured data in this field are those related to clinical manifestations. Such data could easily undergo some structuring with the use of semi-structured medical records focusing on a few symptoms.


Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Registros Electrónicos de Salud
16.
JMIR Med Inform ; 11: e46267, 2023 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-37621195

RESUMEN

Background: Throughout the COVID-19 pandemic, many hospitals conducted routine testing of hospitalized patients for SARS-CoV-2 infection upon admission. Some of these patients are admitted for reasons unrelated to COVID-19 and incidentally test positive for the virus. Because COVID-19-related hospitalizations have become a critical public health indicator, it is important to identify patients who are hospitalized because of COVID-19 as opposed to those who are admitted for other indications. Objective: We compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from electronic health records (EHRs), including structured EHR data elements, clinical notes, or a combination of both data types. Methods: We conducted a retrospective data analysis, using clinician chart review-based validation at a large academic medical center. We reviewed and analyzed the charts of 586 hospitalized individuals who tested positive for SARS-CoV-2 in January 2022. We used LASSO (least absolute shrinkage and selection operator) regression and random forests to fit classification algorithms that incorporated structured EHR data elements, clinical notes, or a combination of structured data and clinical notes. We used natural language processing to incorporate data from clinical notes. The performance of each model was evaluated based on the area under the receiver operator characteristic curve (AUROC) and an associated decision rule based on sensitivity and positive predictive value. We also identified top words and clinical indicators of COVID-19-specific hospitalization and assessed the impact of different phenotyping strategies on estimated hospital outcome metrics. Results: Based on a chart review, 38.2% (224/586) of patients were determined to have been hospitalized for reasons other than COVID-19, despite having tested positive for SARS-CoV-2. A computable phenotype that used clinical notes had significantly better discrimination than one that used structured EHR data elements (AUROC: 0.894 vs 0.841; P<.001) and performed similarly to a model that combined clinical notes with structured data elements (AUROC: 0.894 vs 0.893; P=.91). Assessments of hospital outcome metrics significantly differed based on whether the population included all hospitalized patients who tested positive for SARS-CoV-2 or those who were determined to have been hospitalized due to COVID-19. Conclusions: These findings highlight the importance of cause-specific phenotyping for COVID-19 hospitalizations. More generally, this work demonstrates the utility of natural language processing approaches for deriving information related to patient hospitalizations in cases where there may be multiple conditions that could serve as the primary indication for hospitalization.

17.
Digit Health ; 9: 20552076231191007, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37529541

RESUMEN

Objective: To describe the development and validation of automated electronic health record data reuse for a multidisciplinary quality dashboard. Materials and methods: Comparative study analyzing a manually extracted and an automatically extracted dataset with 262 patients treated for HNC cancer in a tertiary oncology center in the Netherlands in 2020. The primary outcome measures were the percentage of agreement on data elements required for calculating quality indicators and the difference between indicators results calculated using manually collected and indicators that used automatically extracted data. Results: The results of this study demonstrate high agreement between manual and automatically collected variables, reaching up to 99.0% agreement. However, some variables demonstrate lower levels of agreement, with one variable showing only a 20.0% agreement rate. The indicator results obtained through manual collection and automatic extraction show high agreement in most cases, with discrepancy rates ranging from 0.3% to 3.5%. One indicator is identified as a negative outlier, with a discrepancy rate of nearly 25%. Conclusions: This study shows that it is possible to use routinely collected structured data to reliably measure the quality of care in real-time, which could render manual data collection for quality measurement obsolete. To achieve reliable data reuse, it is important that relevant data is recorded as structured data during the care process. Furthermore, the results also imply that data validation is conditional to development of a reliable dashboard.

18.
Proc Natl Acad Sci U S A ; 120(30): e2302028120, 2023 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-37463204

RESUMEN

How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless-Anderson-Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.

19.
Regul Toxicol Pharmacol ; 142: 105426, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37277057

RESUMEN

In the European Union, the Chemicals Strategy for Sustainability (CSS) highlights the need to enhance the identification and assessment of substances of concern while reducing animal testing, thus fostering the development and use of New Approach Methodologies (NAMs) such as in silico, in vitro and in chemico. In the United States, the Tox21 strategy aims at shifting toxicological assessments away from traditional animal studies towards target-specific, mechanism-based and biological observations mainly obtained by using NAMs. Many other jurisdictions around the world are also increasing the use of NAMs. Hence, the provision of dedicated non-animal toxicological data and reporting formats as a basis for chemical risk assessment is necessary. Harmonising data reporting is crucial when aiming at re-using and sharing data for chemical risk assessment across jurisdictions. The OECD has developed a series of OECD Harmonised Templates (OHT), which are standard data formats designed for reporting information used for the risk assessment of chemicals relevant to their intrinsic properties, including effects on human health (e.g., toxicokinetics, skin sensitisation, repeated dose toxicity) and the environment (e.g., toxicity to test species and wildlife, biodegradation in soil, metabolism of residues in crops). The objective of this paper is to demonstrate the applicability of the OHT standard format for reporting information under various chemical risk assessment regimes, and to provide users with practical guidance on the use of OHT 201, in particular to report test results on intermediate effects and mechanistic information.


Asunto(s)
Organización para la Cooperación y el Desarrollo Económico , Piel , Humanos , Medición de Riesgo/métodos
20.
Stat Modelling ; 23(3): 203-227, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37334164

RESUMEN

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an ℓ2 penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA