Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 514
Filtrar
1.
Comput Med Imaging Graph ; 117: 102434, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39284244

RESUMEN

Accurate segmentation of the pancreas in computed tomography (CT) holds paramount importance in diagnostics, surgical planning, and interventions. Recent studies have proposed supervised deep-learning models for segmentation, but their efficacy relies on the quality and quantity of the training data. Most of such works employed small-scale public datasets, without proving the efficacy of generalization to external datasets. This study explored the optimization of pancreas segmentation accuracy by pinpointing the ideal dataset size, understanding resource implications, examining manual refinement impact, and assessing the influence of anatomical subregions. We present the AIMS-1300 dataset encompassing 1,300 CT scans. Its manual annotation by medical experts required 938 h. A 2.5D UNet was implemented to assess the impact of training sample size on segmentation accuracy by partitioning the original AIMS-1300 dataset into 11 smaller subsets of progressively increasing numerosity. The findings revealed that training sets exceeding 440 CTs did not lead to better segmentation performance. In contrast, nnU-Net and UNet with Attention Gate reached a plateau for 585 CTs. Tests on generalization on the publicly available AMOS-CT dataset confirmed this outcome. As the size of the partition of the AIMS-1300 training set increases, the number of error slices decreases, reaching a minimum with 730 and 440 CTs, for AIMS-1300 and AMOS-CT datasets, respectively. Segmentation metrics on the AIMS-1300 and AMOS-CT datasets improved more on the head than the body and tail of the pancreas as the dataset size increased. By carefully considering the task and the characteristics of the available data, researchers can develop deep learning models without sacrificing performance even with limited data. This could accelerate developing and deploying artificial intelligence tools for pancreas surgery and other surgical data science applications.

3.
Am J Med Genet A ; : e63882, 2024 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-39279436

RESUMEN

Rare germline variation in regulator of telomere elongation helicase 1 (RTEL1) is associated with telomere biology disorders (TBDs). Biallelic RTEL1 variants result in childhood onset dyskeratosis congenita and Hoyeraal-Hreidarsson syndrome whereas heterozygous individuals usually present later in life with pulmonary fibrosis or bone marrow failure. We compiled all TBD-associated RTEL1 variants in the literature and assessed phenotypes and outcomes of 44 individuals from 14 families with mono- or biallelic RTEL1 variants enrolled in clinical trial NCT00027274. Variants were classified by adapting ACMG-AMP guidelines using clinical information, telomere length, and variant allele frequency data. Compared with heterozygotes, individuals with biallelic RTEL1 variants had an earlier age at diagnosis (median age 35.5 vs. 5.1 years, p < 0.01) and worse overall survival (median age 66.5 vs. 22.9 years, p < 0.001). There were 257 unique RTEL1 variants reported in 47 publications, and 209 had a gnomAD minor allele frequency <1%. Only 38.3% (80/209) met pathogenic/likely pathogenic criteria. Notably, 8 of 209 reported disease-associated variants were benign or likely benign and the rest were variants of uncertain significance. Given the considerable differences in outcomes of TBDs associated with RTEL1 germline variants and the extent of variation in the gene, systematic functional studies and standardization of variant curation are urgently needed to inform clinical management.

4.
Proteins ; 2024 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-39258438

RESUMEN

Predicting the precise locations of metal binding sites within metalloproteins is a crucial challenge in biophysics. A fast, accurate, and interpretable computational prediction method can complement the experimental studies. In the current work, we have developed a method to predict the location of Ca2+ ions in calcium-binding proteins using a physics-based method with an all-atom description of the proteins, which is substantially faster than the molecular dynamics simulation-based methods with accuracy as good as data-driven approaches. Our methodology uses the three-dimensional reference interaction site model (3D-RISM), a statistical mechanical theory, to calculate Ca2+ ion density around protein structures, and the locations of the Ca2+ ions are obtained from the density. We have taken previously used datasets to assess the efficacy of our method as compared to previous works. Our accuracy is 88%, comparable with the FEATURE program, one of the well-known data-driven methods. Moreover, our method is physical, and the reasons for failures can be ascertained in most cases. We have thoroughly examined the failed cases using different structural and crystallographic measures, such as B-factor, R-factor, electron density map, and geometry at the binding site. It has been found that x-ray structures have issues in many of the failed cases, such as geometric irregularities and dubious assignment of ion positions. Our algorithm, along with the checks for structural accuracy, is a major step in predicting calcium ion positions in metalloproteins.

5.
Stud Health Technol Inform ; 317: 160-170, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39234719

RESUMEN

INTRODUCTION: 16 million German-language free-text laboratory test results are the basis of the daily diagnostic routine of 17 laboratories within the University Hospital Erlangen. As part of the Medical Informatics Initiative, the local data integration centre is responsible for the accessibility of routine care data for medical research. Following the core data set, international interoperability standards such as FHIR and the English-language medical terminology SNOMED CT are used to create harmonised data. To represent each non-numeric laboratory test result within the base module profile ObservationLab, the need for a map and supporting tooling arose. STATE OF THE ART: Due to the requirement of a n:n map and a data safety-compliant local instance, publicly available tools (e.g., SNAP2SNOMED) were insufficient. Concept and Implementation: Therefore, we developed (1) an incremental mapping-validation process with different iteration cycles and (2) a customised mapping tool via Microsoft Access. Time, labour, and cost efficiency played a decisive role. First iterations were used to define requirements (e.g., multiple user access). LESSONS LEARNED: The successful process and tool implementation and the described lessons learned (e.g., cheat sheet) will assist other German hospitals in creating local maps for inter-consortia data exchange and research. In the future, qualitative and quantitative analysis results will be published.


Asunto(s)
Systematized Nomenclature of Medicine , Alemania , Humanos , Registros Electrónicos de Salud , Integración de Sistemas
6.
Chemosphere ; 364: 143078, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39181462

RESUMEN

The US EPA ECOTOX database provides key ecotoxicological data that are crucial in environmental risk assessment. It can be used for computational predictions of toxicity or indications of hazard in a wide range of situations. There is no standardised or formalised method for extracting and subsetting data from the database for these purposes. Consequently, results in such meta-analyses are difficult to reproduce. The present study introduces the software package ECOTOXr, which provides the means to formalise data retrieval from the ECOTOX database in the R scripting language. Three cases are presented to evaluate the performance of the package in relation to earlier data extractions and searches on the website. These cases demonstrate that the package can reproduce data sets relatively well. Furthermore, they illustrate how future studies can further improve traceability and reproducibility by applying the package and adhering to some simple guidelines. This contributes to the FAIR principles, credibility and acceptance of research that uses data from the ECOTOX database.

7.
Am J Hum Genet ; 111(9): 2044-2058, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39142283

RESUMEN

The ENIGMA research consortium develops and applies methods to determine clinical significance of variants in hereditary breast and ovarian cancer genes. An ENIGMA BRCA1/2 classification sub-group, formed in 2015 as a ClinGen external expert panel, evolved into a ClinGen internal Variant Curation Expert Panel (VCEP) to align with Food and Drug Administration recognized processes for ClinVar contributions. The VCEP reviewed American College of Medical Genetics and Genomics/Association of Molecular Pathology (ACMG/AMP) classification criteria for relevance to interpreting BRCA1 and BRCA2 variants. Statistical methods were used to calibrate evidence strength for different data types. Pilot specifications were tested on 40 variants and documentation revised for clarity and ease of use. The original criterion descriptions for 13 evidence codes were considered non-applicable or overlapping with other criteria. Scenario of use was extended or re-purposed for eight codes. Extensive analysis and/or data review informed specification descriptions and weights for all codes. Specifications were applied to pilot variants with pre-existing ClinVar classification as follows: 13 uncertain significance or conflicting, 14 pathogenic and/or likely pathogenic, and 13 benign and/or likely benign. Review resolved classification for 11/13 uncertain significance or conflicting variants and retained or improved confidence in classification for the remaining variants. Alignment of pre-existing ENIGMA research classification processes with ACMG/AMP classification guidelines highlighted several gaps in the research processes and the baseline ACMG/AMP criteria. Calibration of evidence strength was key to justify utility and strength of different data types for gene-specific application. The gene-specific criteria demonstrated value for improving ACMG/AMP-aligned classification of BRCA1 and BRCA2 variants.


Asunto(s)
Proteína BRCA1 , Proteína BRCA2 , Variación Genética , Humanos , Proteína BRCA2/genética , Proteína BRCA1/genética , Femenino , Neoplasias de la Mama/genética , Genómica/métodos , Bases de Datos Genéticas , Neoplasias Ováricas/genética , Predisposición Genética a la Enfermedad , Pruebas Genéticas/métodos
8.
Front Pharmacol ; 15: 1444733, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39170704

RESUMEN

Background and Objective: Chronic atrophic gastritis (CAG) is a complex chronic disease caused by multiple factors that frequently occurs disease in the clinic. The worldwide prevalence of CAG is high. Interestingly, clinical CAG patients often present with a variety of symptom phenotypes, which makes it more difficult for clinicians to treat. Therefore, there is an urgent need to improve our understanding of the complexity of the clinical CAG population, obtain more accurate disease subtypes, and explore the relationship between clinical symptoms and medication. Therefore, based on the integrated platform of complex networks and clinical research, we classified the collected patients with CAG according to their different clinical characteristics and conducted correlation analysis on the classification results to identify more accurate disease subtypes to aid in personalized clinical treatment. Method: Traditional Chinese medicine (TCM) offers an empirical understanding of the clinical subtypes of complicated disorders since TCM therapy is tailored to the patient's symptom profile. We gathered 6,253 TCM clinical electronic medical records (EMRs) from CAG patients and manually annotated, extracted, and preprocessed the data. A shared symptom-patient similarity network (PSN) was created. CAG patient subgroups were established, and their clinical features were determined through enrichment analysis employing community identification methods. Different clinical features of relevant subgroups were correlated based on effectiveness to identify symptom-botanical botanical drugs correspondence. Moreover, network pharmacology was employed to identify possible biological relationships between screened symptoms and medications and to identify various clinical and molecular aspects of the key subtypes using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Results: 5,132 patients were included in the study: 2,699 males (52.60%) and 2,433 females (47.41%). The population was divided into 176 modules. We selected the first 3 modules (M29, M3, and M0) to illustrate the characteristic phenotypes and genotypes of CAG disease subtypes. The M29 subgroup was characterized by gastric fullness disease and internal syndrome of turbidity and poison. The M3 subgroup was characterized by epigastric pain and disharmony between the liver and stomach. The M0 subgroup was characterized by epigastric pain and dampness-heat syndrome. In symptom analysis, The top symptoms for symptom improvement in all three subgroups were stomach pain, bloating, insomnia, poor appetite, and heartburn. However, the three groups were different. The M29 subgroup was more likely to have stomach distention, anorexia, and palpitations. Citrus medica, Solanum nigrum, Jiangcan, Shan ci mushrooms, and Dillon were the most popular botanical drugs. The M3 subgroup has a higher incidence of yellow urine, a bitter tongue, and stomachaches. Smilax glabra, Cyperus rotundus, Angelica sinensis, Conioselinum anthriscoides, and Paeonia lactiflora were the botanical drugs used. Vomiting, nausea, stomach pain, and appetite loss are common in the M0 subgroup. The primary medications are Scutellaria baicalensis, Smilax glabra, Picrorhiza kurroa, Lilium lancifolium, and Artemisia scoparia. Through GO and KEGG pathway analysis, We found that in the M29 subgroup, Citrus medica, Solanum nigrum, Jiangcan, Shan ci mushrooms, and Dillon may exert their therapeutic effects on the symptoms of gastric distension, anorexia, and palpitations by modulating apoptosis and NF-κB signaling pathways. In the M3 subgroup, Smilax glabra, Cyperus rotundus, Angelica sinensis, Conioselinum anthriscoides, and Paeonia lactiflora may be treated by NF-κB and JAK-STAT signaling pathway for the treatment of stomach pain, bitter mouth, and yellow urine. In the M0 subgroup, Scutellaria baicalensis, Smilax glabra, Picrorhiza kurroa, Lilium lancifolium, and Artemisia scoparia may exert their therapeutic effects on poor appetite, stomach pain, vomiting, and nausea through the PI3K-Akt signaling pathway. Conclusion: Based on PSN identification and community detection analysis, CAG population division can provide useful recommendations for clinical CAG treatment. This method is useful for CAG illness classification and genotyping investigations and can be used for other complicated chronic diseases.

9.
J Med Libr Assoc ; 112(2): 81-87, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-39119170

RESUMEN

Background: NYU Langone Health offers a collaborative research block for PGY3 Primary Care residents that employs a secondary data analysis methodology. As discussions of data reuse and secondary data analysis have grown in the data library literature, we sought to understand what attitudes internal medicine residents at a large urban academic medical center had around secondary data analysis. This case report describes a novel survey on resident attitudes around data sharing. Methods: We surveyed internal medicine residents in three tracks: Primary Care (PC), Categorical, and Clinician-Investigator (CI) tracks as part of a larger pilot study on implementation of a research block. All three tracks are in our institution's internal medicine program. In discussions with residency directors and the chief resident, the term "secondary data analysis" was chosen over "data reuse" due to this being more familiar to clinicians, but examples were given to define the concept. Results: We surveyed a population of 162 residents, and 67 residents responded, representing a 41.36% response rate. Strong majorities of residents exhibited positive views of secondary data analysis. Moreover, in our sample, those with exposure to secondary data analysis research opined that secondary data analysis takes less time and is less difficult to conduct compared to the other residents without curricular exposure to secondary analysis. Discussion: The survey reflects that residents believe secondary data analysis is worthwhile and this highlights opportunities for data librarians. As current residents matriculate into professional roles as clinicians, educators, and researchers, libraries have an opportunity to bolster support for data curation and education.


Asunto(s)
Actitud del Personal de Salud , Medicina Interna , Internado y Residencia , Internado y Residencia/estadística & datos numéricos , Humanos , Medicina Interna/educación , Encuestas y Cuestionarios , Masculino , Femenino , Adulto , Difusión de la Información/métodos
10.
Front Genet ; 15: 1296797, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39036704

RESUMEN

Objective: Fructose-1,6-bisphosphatase deficiency (FBP1D) is a rare inborn error due to mutations in the FBP1 gene. The genetic spectrum of FBP1D in China is unknown, also nonspecific manifestations confuse disease diagnosis. We systematically estimated the FBP1D prevalence in Chinese and explored genotype-phenotype association. Methods: We collected 101 FBP1 variants from our cohort and public resources, and manually curated pathogenicity of these variants. Ninety-seven pathogenic or likely pathogenic variants were used in our cohort to estimate Chinese FBP1D prevalence by three methods: 1) carrier frequency, 2) permutation and combination, 3) Bayesian framework. Allele frequencies (AFs) of these variants in our cohort, China Metabolic Analytics Project (ChinaMAP) and gnomAD were compared to reveal the different hotspots in Chinese and other populations. Clinical and genetic information of 122 FBP1D patients from our cohort and published literature were collected to analyze the genotype-phenotypes association. Phenotypes of 68 hereditary fructose intolerance (HFI) patients from our previous study were used to compare the phenotypic differences between these two fructose metabolism diseases. Results: The estimated Chinese FBP1D prevalence was 1/1,310,034. In the Chinese population, c.490G>A and c.355G>A had significantly higher AFs than in the non-Finland European population, and c.841G>A had significantly lower AF value than in the South Asian population (all p values < 0.05). The genotype-phenotype association analyses showed that patients carrying homozygous c.841G>A were more likely to present increased urinary glycerol, carrying two CNVs (especially homozygous exon1 deletion) were often with hepatic steatosis, carrying compound heterozygous variants were usually with lethargy, and carrying homozygous variants were usually with ketosis and hepatic steatosis (all p values < 0.05). By comparing to phenotypes of HFI patients, FBP1D patients were more likely to present hypoglycemia, metabolic acidosis, and seizures (all p-value < 0.05). Conclusion: The prevalence of FBP1D in the Chinese population is extremely low. Genetic sequencing could effectively help to diagnose FBP1D.

11.
J Cheminform ; 16(1): 82, 2024 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-39030583

RESUMEN

PURPOSE: Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. METHODS: The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. RESULTS: The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. CONCLUSION: The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SCIENTIFIC CONTRIBUTION: SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

12.
BMC Med ; 22(1): 288, 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-38987774

RESUMEN

BACKGROUND: Ethnicity is known to be an important correlate of health outcomes, particularly during the COVID-19 pandemic, where some ethnic groups were shown to be at higher risk of infection and adverse outcomes. The recording of patients' ethnic groups in primary care can support research and efforts to achieve equity in service provision and outcomes; however, the coding of ethnicity is known to present complex challenges. We therefore set out to describe ethnicity coding in detail with a view to supporting the use of this data in a wide range of settings, as part of wider efforts to robustly describe and define methods of using administrative data. METHODS: We describe the completeness and consistency of primary care ethnicity recording in the OpenSAFELY-TPP database, containing linked primary care and hospital records in > 25 million patients in England. We also compared the ethnic breakdown in OpenSAFELY-TPP with that of the 2021 UK census. RESULTS: 78.2% of patients registered in OpenSAFELY-TPP on 1 January 2022 had their ethnicity recorded in primary care records, rising to 92.5% when supplemented with hospital data. The completeness of ethnicity recording was higher for women than for men. The rate of primary care ethnicity recording ranged from 77% in the South East of England to 82.2% in the West Midlands. Ethnicity recording rates were higher in patients with chronic or other serious health conditions. For each of the five broad ethnicity groups, primary care recorded ethnicity was within 2.9 percentage points of the population rate as recorded in the 2021 Census for England as a whole. For patients with multiple ethnicity records, 98.7% of the latest recorded ethnicities matched the most frequently coded ethnicity. Patients whose latest recorded ethnicity was categorised as Other were most likely to have a discordant ethnicity recording (32.2%). CONCLUSIONS: Primary care ethnicity data in OpenSAFELY is present for over three quarters of all patients, and combined with data from other sources can achieve a high level of completeness. The overall distribution of ethnicities across all English OpenSAFELY-TPP practices was similar to the 2021 Census, with some regional variation. This report identifies the best available codelist for use in OpenSAFELY and similar electronic health record data.


Asunto(s)
Etnicidad , Atención Primaria de Salud , Medicina Estatal , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios de Cohortes , Inglaterra , Etnicidad/estadística & datos numéricos , Atención Primaria de Salud/estadística & datos numéricos , Recién Nacido , Lactante , Preescolar , Niño , Adolescente , Adulto Joven , Anciano de 80 o más Años
13.
Front Med (Lausanne) ; 11: 1455319, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39045419

RESUMEN

[This corrects the article DOI: 10.3389/fmed.2024.1365501.].

14.
Health Inf Manag ; : 18333583241256049, 2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39045683

RESUMEN

In 2022 the Australian Data Availability and Transparency Act (DATA) commenced, enabling accredited "data users" to access data from "accredited data service providers." However, the DATA Scheme lacks guidance on "trustworthiness" of the data to be utilised for reuse purposes. Objectives: To determine: (i) Do researchers using government health datasets trust the data? (ii) What factors influence their perceptions of data trustworthiness? and (iii) What are the implications for government and data custodians? Method: Authors of published studies (2008-2020) that utilised Victorian government health datasets were surveyed via a case study approach. Twenty-eight trust constructs (identified via literature review) were grouped into data factors, management properties and provider factors. Results: Fifty experienced health researchers responded. Most (88%) believed that Victorian government health data were trustworthy. When grouped, data factors and management properties were more important than data provider factors in building trust. The most important individual trust constructs were: "compliant with ethical regulation" (100%) and "monitoring privacy and confidentiality" (98%). Constructs of least importance were knowledge of "participant consent" (56%) and "major focus of the data provider was research" (50%). Conclusion: Overall, the researchers trusted government health data, but data factors and data management properties were more important than data provider factors in building trust. Implications: Government should ensure the DATA Scheme incorporates mechanisms to validate those data utilised by accredited data users and data providers have sufficient quality (intrinsic and extrinsic) to meet the requirements of "trustworthiness," and that evidentiary documentation is provided to support these "accredited data."

15.
Ophthalmic Genet ; : 1-7, 2024 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-39016008

RESUMEN

PURPOSE: The biallelic variant of MAB21L1 has previously been documented in conjunction with the autosomal recessive cerebellar, ocular, craniofacial, and genital syndrome (COFG). The purpose of this study was to investigate the gene-disease association of MAB21L1 and the newly discovered autosomal dominant (AD) microphthalmia. METHODS: We report the presence of an exceptionally rare missense variant in a single allele of the Arg51 codon of MAB21L1 among four individuals from a single family diagnosed with microphthalmia, which suggesting an autosomal dominant inheritance pattern. Subsequently, based on comprehensive literature review, we identified another 13 families that have reported cases of autosomal dominant microphthalmos. RESULTS: Genotype-phenotype analysis revealed that patients with a single allele missense variant in MAB21L1 exhibited solely eye abnormalities. This starkly diverged from the clinical presentation of COFG, typified by the concurrent occurrence of ocular and extraocular symptoms stemming from the biallelic variant in MAB21L1. Our findings revealed that the heterozygous pathogenic variant in MAB21L1 resulted in the emergence of autosomal dominant microphthalmia. By combining these genetic and experimental evidence, the clinical validity of MAB21L1 and the emerging autosomal dominant microphthalmia can be regarded as moderate. CONCLUSION: In summary, there is sufficient convincing evidence to prove that MAB21L1 is a novel pathogenic gene responsible for autosomal dominant microphthalmia, thus offering valuable insights for precise diagnosis and targeted therapeutic interventions in cases of microphthalmia.

16.
Mol Genet Metab ; 142(3): 108514, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38905920

RESUMEN

Phenylketonuria (PKU) is a genetic disorder caused by variations in the phenylalanine hydroxylase (PAH) gene. Among the 3369 reported PAH variants, 33.7% are missense alterations. Unfortunately, 30% of these missense variants are classified as variants of unknown significance (VUS), posing challenges for genetic risk assessment. In our study, we focused on analyzing 836 missense PAH variants following the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines specified by ClinGen PAH Variant Curation Expert Panel (VCEP) criteria. We utilized and compared variant annotator tools like Franklin and Varsome, conducted 3D structural analysis of PAH, and examined active and regulatory site hotspots. In addition, we assessed potential splicing effect of apparent missense variants. By evaluating phenotype data from 22962 PKU patients, our aim was to reassess the pathogenicity of missense variants. Our comprehensive approach successfully reclassified 309 VUSs out of 836 missense variants as likely pathogenic or pathogenic (37%), upgraded 370 likely pathogenic variants to pathogenic, and reclassified one previously considered likely benign variant as likely pathogenic. Phenotypic information was available for 636 missense variants, with 441 undergoing 3D structural analysis and active site hotspot identification for 180 variants. After our analysis, only 6% of missense variants were classified as VUSs, and three of them (c.23A>C/p.Asn8Thr, c.59_60delinsCC/p.Gln20Pro, and c.278A >T/p.Asn93Ile) may be influenced by abnormal splicing. Moreover, a pathogenic variant (c.168G>T/p.Glu56Asp) was identified to have a risk exceeding 98% for modifications of the consensus splice site, with high scores indicating a donor loss of 0.94. The integration of ACMG/AMP guidelines with in silico structural analysis and phenotypic data significantly reduced the number of missense VUSs, providing a strong basis for genetic counseling and emphasizing the importance of metabolic phenotype information in variant curation. This study also sheds light on the current landscape of PAH variants.


Asunto(s)
Mutación Missense , Fenotipo , Fenilalanina Hidroxilasa , Fenilcetonurias , Humanos , Fenilalanina Hidroxilasa/genética , Fenilalanina Hidroxilasa/química , Fenilcetonurias/genética , Fenilcetonurias/patología , Simulación por Computador
17.
J Cheminform ; 16(1): 74, 2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38937840

RESUMEN

This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.

18.
Genomics Inform ; 22(1): 7, 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38907285

RESUMEN

This study evaluated large language models (LLMs), particularly the GPT-4 with vision (GPT-4 V) and GPT-4 Turbo, for annotating biomedical figures, focusing on cellular senescence. We assessed the ability of LLMs to categorize and annotate complex biomedical images to enhance their accuracy and efficiency. Our experiments employed prompt engineering with figures from review articles, achieving more than 70% accuracy for label extraction and approximately 80% accuracy for node-type classification. Challenges were noted in the correct annotation of the relationship between directionality and inhibitory processes, which were exacerbated as the number of nodes increased. Using figure legends was a more precise identification of sources and targets than using captions, but sometimes lacked pathway details. This study underscores the potential of LLMs in decoding biological mechanisms from text and outlines avenues for improving inhibitory relationship representations in biomedical informatics.

19.
Front Neuroinform ; 18: 1385526, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38828185

RESUMEN

There is an increasing desire to study neurodevelopmental disorders (NDDs) together to understand commonalities to develop generic health promotion strategies and improve clinical treatment. Common data elements (CDEs) collected across studies involving children with NDDs afford an opportunity to answer clinically meaningful questions. We undertook a retrospective, secondary analysis of data pertaining to sleep in children with different NDDs collected through various research studies. The objective of this paper is to share lessons learned for data management, collation, and harmonization from a sleep study in children within and across NDDs from large, collaborative research networks in the Ontario Brain Institute (OBI). Three collaborative research networks contributed demographic data and data pertaining to sleep, internalizing symptoms, health-related quality of life, and severity of disorder for children with six different NDDs: autism spectrum disorder; attention deficit/hyperactivity disorder; obsessive compulsive disorder; intellectual disability; cerebral palsy; and epilepsy. Procedures for data harmonization, derivations, and merging were shared and examples pertaining to severity of disorder and sleep disturbances were described in detail. Important lessons emerged from data harmonizing procedures: prioritizing the collection of CDEs to ensure data completeness; ensuring unprocessed data are uploaded for harmonization in order to facilitate timely analytic procedures; the value of maintaining variable naming that is consistent with data dictionaries at time of project validation; and the value of regular meetings with the research networks to discuss and overcome challenges with data harmonization. Buy-in from all research networks involved at study inception and oversight from a centralized infrastructure (OBI) identified the importance of collaboration to collect CDEs and facilitate data harmonization to improve outcomes for children with NDDs.

20.
Front Med (Lausanne) ; 11: 1365501, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38813389

RESUMEN

The emerging European Health Data Space (EHDS) Regulation opens new prospects for large-scale sharing and re-use of health data. Yet, the proposed regulation suffers from two important limitations: it is designed to benefit the whole population with limited consideration for individuals, and the generation of secondary datasets from heterogeneous, unlinked patient data will remain burdensome. AIDAVA, a Horizon Europe project that started in September 2022, proposes to address both shortcomings by providing patients with an AI-based virtual assistant that maximises automation in the integration and transformation of their health data into an interoperable, longitudinal health record. This personal record can then be used to inform patient-related decisions at the point of care, whether this is the usual point of care or a possible cross-border point of care. The personal record can also be used to generate population datasets for research and policymaking. The proposed solution will enable a much-needed paradigm shift in health data management, implementing a 'curate once at patient level, use many times' approach, primarily for the benefit of patients and their care providers, but also for more efficient generation of high-quality secondary datasets. After 15 months, the project shows promising preliminary results in achieving automation in the integration and transformation of heterogeneous data of each individual patient, once the content of the data sources managed by the data holders has been formally described. Additionally, the conceptualization phase of the project identified a set of recommendations for the development of a patient-centric EHDS, significantly facilitating the generation of data for secondary use.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA