Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 389
Filtrar
1.
Ecol Evol ; 14(9): e70235, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39219570

RESUMEN

Species-environment relationships have been extensively explored through species distribution models (SDM) and species abundance models (SAM), which have become key components to understand the spatial ecology and population dynamics directed at biodiversity conservation. Nonetheless, within the internal structure of species' ranges, habitat suitability and species abundance do not always show similar patterns, and using information derived from either SDM or SAM could be incomplete and mislead conservation efforts. We gauged support for the abundance-suitability relationship and used the combined information to prioritize the conservation of South American dwarf caimans (Paleosuchus palpebrosus and P. trigonatus). We used 7 environmental predictor sets (surface water, human impact, topography, precipitation, temperature, dynamic habitat indices, soil temperature), 2 regressions methods (Generalized Linear Models-GLM, Generalized Additive Models-GAM), and 4 parametric distributions (Binomial, Poisson, Negative binomial, Gamma) to develop distribution and abundance models. We used the best predictive models to define four categories (low, medium, high, very high) to plan species conservation. The best distribution and abundance models for both Paleosuchus species included a combination of all predictor sets, except for the best abundance model for P. trigonatus which incorporated only temperature, precipitation, surface water, human impact, and topography. We found non-consistent and low explanatory power of environmental suitability to predict abundance which aligns with previous studies relating SDM-SAM. We extracted the most relevant information from each optimal SDM and SAM and created a consensus model (2,790,583 km2) that we categorized as low (39.6%), medium (42.7%), high (14.9%), and very high (2.8%) conservation priorities. We identified 279,338 km2 where conservation must be critically prioritized and only 29% of these areas are under protection. We concluded that optimal models from correlative methods can be used to provide a systematic prioritization scheme to promote conservation and as surrogates to generate insights for quantifying ecological patterns.

2.
Hum Brain Mapp ; 45(13): e70012, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39230061

RESUMEN

Thompson et al., 2023 (Generalized models for quantifying laterality using functional transcranial Doppler ultrasound. Human Brain Mapping, 44(1), 35-48) introduced generalised model-based analysis methods for determining cerebral lateralisation from functional transcranial Doppler ultrasound (fTCD) data which substantially decreased the uncertainty of individual lateralisation estimates across several large adult samples. We aimed to assess the suitability of these methods for increasing precision in lateralisation estimates for child fTCD data. We applied these methods to adult fTCD data to establish the validity of two child-friendly language and visuospatial tasks. We also applied the methods to fTCD data from 4- to 7-year-old children. For both samples, the laterality estimates from the complex generalised additive model (GAM) approach correlated strongly with the traditional methods while also decreasing individual standard errors compared to the popular period-of-interest averaging method. We recommend future research using fTCD with young children consider using GAMs to reduce the noise in their LI estimates.


Asunto(s)
Lateralidad Funcional , Ultrasonografía Doppler Transcraneal , Humanos , Ultrasonografía Doppler Transcraneal/métodos , Ultrasonografía Doppler Transcraneal/normas , Preescolar , Niño , Femenino , Masculino , Lateralidad Funcional/fisiología , Adulto , Adulto Joven , Corteza Cerebral/diagnóstico por imagen , Corteza Cerebral/fisiología
3.
Front Vet Sci ; 11: 1416862, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39113719

RESUMEN

Introduction: African swine fever (ASF) is a disease with a high mortality rate and high transmissibility. Identifying high-risk clusters and understanding the transmission characteristics of ASF in advance are essential for preventing its spread in a short period of time. This study investigated the spatial and temporal heterogeneity of ASF in the Republic of Korea by analyzing surveillance data on wild boar carcasses. Methods: We observed a distinct annual propagation pattern, with the occurrence of ASF-infected carcasses trending southward over time. We developed a rank-based statistical model to evaluate risk by estimating the average weekly number of carcasses per district over time, allowing us to analyze and identify risk clusters of ASF. We conducted an analysis to identify risk clusters for two distinct periods, Late 2022 and Early 2023, utilizing data from ASF-infected carcasses. To address the underestimation of risk and observation error due to incomplete surveillance data, we estimated the number of ASF-infected individuals and accounted for observation error via different surveillance intensities. Results: As a result, in Late 2022, the risk clusters identified by observed and estimated number of ASF-infected carcasses were almost identical, particularly in the northwestern Gyeongbuk region, north Chungbuk region, and southwestern Gangwon region. In Early 2023, we observed a similar pattern with numerous risk clusters identified in the same regions as in Late 2022. Discussion: This approach enhances our understanding of ASF spatial dynamics. Additionally, it contributes to the epidemiology and study of animal infectious diseases by highlighting areas requiring urgent and focused intervention. By providing crucial data for the targeted allocation of resources for disease management and preventive measures, our findings lay vital groundwork for improving ASF management strategies, ultimately aiding in the containment and control of this devastating disease.

4.
Geohealth ; 8(8): e2024GH001092, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39104964

RESUMEN

The impact of heatwaves (HWs) on human health is a topic of growing interest due to the global magnification of these phenomena and their substantial socio-economic impacts. As for other countries of Southern Europe, Spain is a region highly affected by heat and its increase under climate change. This is observed in the mean values and the increasing incidence of extreme weather events and associated mortality. Despite the vast knowledge on this topic, it remains unclear whether specific types and characteristics of HW are particularly harmful to the population and whether this shows a regional interdependency. The present study provides a comprehensive analysis of the relationship between HW characteristics and mortality in 12 Spanish cities. We used separated time series analysis in each city applying a quasi-Poisson regression model and distributed lag linear and non-linear models. Results show an increase in the mortality risk under HW conditions in the cities with a lower HW frequency. However, this increase exhibits remarkable differences across the cities under study not showing any general pattern in the HW characteristics-mortality association. This relationship is shown to be complex and strongly dependent on the local properties of each city pointing out the crucial need to examine and understand on a local scale the HW characteristics and the HW-mortality relationship for an efficient design and implementation of prevention measures.

5.
Stat Methods Med Res ; : 9622802241267812, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39110407

RESUMEN

The restricted mean survival time (RMST) is often of direct interest in clinical studies involving censored survival outcomes. It describes the area under the survival curve from time zero to a specified time point. When data are subject to length-biased sampling, as is frequently encountered in observational cohort studies, existing methods cannot estimate the RMST for various restriction times through a single model. In this article, we model the RMST as a continuous function of the restriction time under the setting of length-biased sampling. Two approaches based on estimating equations are proposed to estimate the time-varying effects of covariates. Finally, we establish the asymptotic properties for the proposed estimators. Simulation studies are performed to demonstrate the finite sample performance. Two real-data examples are analyzed by our procedures.

6.
Bioinformatics ; 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39172488

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) enables comprehensive characterization of the cell state. However, its destructive nature prohibits measuring gene expression changes during dynamic processes such as embryogenesis. Although recent studies integrating scRNA-seq with lineage tracing have provided clonal insights between progenitor and mature cells, challenges remain. Because of their experimental nature, observations are sparse, and cells observed in the early state are not the exact progenitors of cells observed at later time points. To overcome these limitations, we developed LineageVAE, a novel computational methodology that utilizes deep learning based on the property that cells sharing barcodes have identical progenitors. RESULTS: LineageVAE is a deep generative model that transforms scRNA-seq observations with identical lineage barcodes into sequential trajectories toward a common progenitor in a latent cell state space. This method enables the reconstruction of unobservable cell state transitions, historical transcriptomes, and regulatory dynamics at a single-cell resolution. Applied to hematopoiesis and reprogrammed fibroblast datasets, LineageVAE demonstrated its ability to restore backward cell state transitions and infer progenitor heterogeneity and transcription factor activity along differentiation trajectories. AVAILABILITY AND IMPLEMENTATION: The LineageVAE model was implemented in Python using the PyTorch deep learning library. The code is available on GitHub at https://github.com/LzrRacer/LineageVAE/. SUPPLEMENTARY INFORMATION: Available at Bioinformatics online.

7.
J Appl Stat ; 51(11): 2197-2213, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39157269

RESUMEN

In this paper, we study the robust estimation and empirical likelihood for the regression parameter in generalized linear models with right censored data. A robust estimating equation is proposed to estimate the regression parameter, and the resulting estimator has consistent and asymptotic normality. A bias-corrected empirical log-likelihood ratio statistic of the regression parameter is constructed, and it is shown that the statistic converges weakly to a standard χ 2 distribution. The result can be directly used to construct the confidence region of regression parameter. We use the bias correction method to directly calibrate the empirical log-likelihood ratio, which does not need to be multiplied by an adjustment factor. We also propose a method for selecting the tuning parameters in the loss function. Simulation studies show that the estimator of the regression parameter is robust and the bias-corrected empirical likelihood is better than the normal approximation method. An example of a real dataset from Alzheimer's disease studies shows that the proposed method can be applied in practical problems.

8.
Neurosci Res ; 2024 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-39098768

RESUMEN

This article presents a mini-review about the progress in inferring monosynaptic connections from spike trains of multiple neurons over the past twenty years. First, we explain a variety of meanings of "neuronal connectivity" in different research areas of neuroscience, such as structural connectivity, monosynaptic connectivity, and functional connectivity. Among these, we focus on the methods used to infer the monosynaptic connectivity from spike data. We then summarize the inference methods based on two main approaches, i.e., correlation-based and model-based approaches. Finally, we describe available source codes for connectivity inference and future challenges. Although inference will never be perfect, the accuracy of identifying the monosynaptic connections has improved dramatically in recent years due to continuous efforts.

9.
Accid Anal Prev ; 207: 107752, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39180851

RESUMEN

The random parameters Generalized Linear Model (GLM) is frequently used to model speeding characteristics and capture the heterogenous effects of factors. However, this statistical approach is seldom employed for prediction and generalization due to the challenge of transferring its predefined errors. Recently, the emergence of explainable AI techniques has illuminated a new path for analyzing factors associated with risky driving behaviors. Despite this, there remains a gap that comparing results from machine and deep learning (ML/DL) approaches with those from random parameters GLM. This study aims to apply the random parameter GLM and explainable deep learning to evaluate the heterogenous effects of factors on the taxis' high-range speeding likelihood. Initially, a Beta GLM with random parameters (BGLM-RP) is developed to model the high-range speeding likelihood among taxi drivers. Additionally, XGBoost, a simple convolutional neural network (Simple-CNN), a deeper CNN (DCNN), and a deeper CNN with self-attention (DCNN-SA) are developed. The quantified explanations and illustrations of the factors' heterogenous effects from ML/DL models are derived from pseudo coefficients by decomposing factors' SHapley Additive exPlanations (SHAP) values. All the developed statistical, ML, and DL models are compared in terms of mean absolute errors and mean square errors on testing and full data. Results show that DCNN-SA excels in prediction on testing data, indicating its superior generalization capabilities, while BGLM-RP outperforms other models on full data. The DCNN-SA can reveal the heterogenous effects of factors for both in-sample and out-of-sample data, which is not possible for the random parameter GLM. However, BGLM-RP can reveal larger magnitudes of the factors' heterogenous effects for in-sample data. The signs and significances are identical between the varying coefficients from BGLM-RP and the pseudo coefficients from the ML/DL models, demonstrating the validity and rationale of using the proposed explanation framework to quantify the factors' effects in ML/DL models. The study also discusses the contributions of various factors to the high-range speeding likelihood of taxi drivers.


Asunto(s)
Conducción de Automóvil , Aprendizaje Profundo , Humanos , Accidentes de Tránsito/prevención & control , Modelos Lineales , Redes Neurales de la Computación , Asunción de Riesgos
10.
Biology (Basel) ; 13(8)2024 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-39194554

RESUMEN

The spatial pattern of diseased forest trees is a product of the spatial pattern of host trees and the disease itself. Previous studies have focused on describing the spatial pattern of diseased host trees, and it remains largely unknown whether an antecedent spatial pattern of host/nonhost trees affects the infection pattern of a disease and how large the effect sizes of the spatial pattern of host/nonhost trees and host size are. The results from trivariate random labeling showed that the antecedent pattern of the host ash tree, Fraxinus mandshurica, but not of nonhost tree species, impacted the infection pattern of a stem fungal disease caused by Inonotus hispidus. To investigate the effect size of the spatial pattern of ash trees, we employed the SADIE (Spatial Analysis by Distance IndicEs) aggregation index and clustering index as predictors in the GLMs. Globally, the spatial pattern (vi index) of ash trees did not affect the infection likelihood of the focal tree; however, the spatial pattern of DBH (diameter at breast height) of ash trees significantly affected the infection likelihood of the focal tree. We sampled a series of circular plots with different radii to investigate the spatial pattern effect of host size on the infection likelihood of the focal tree locally. The results showed that the location (patch/gap) of the DBH of the focal tree, rather than that of the focal tree itself, significantly affected its infection likelihood in most plots of the investigated sizes. A meta-analysis was employed to settle the discrepancy between plots of different sizes, which led to results consistent with those of global studies. The results from meta-regression showed that plot size had no significant effects.

11.
Psychometrika ; 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38967857

RESUMEN

Cognitive diagnostic models (CDMs) are a popular family of discrete latent variable models that model students' mastery or deficiency of multiple fine-grained skills. CDMs have been most widely used to model categorical item response data such as binary or polytomous responses. With advances in technology and the emergence of varying test formats in modern educational assessments, new response types, including continuous responses such as response times, and count-valued responses from tests with repetitive tasks or eye-tracking sensors, have also become available. Variants of CDMs have been proposed recently for modeling such responses. However, whether these extended CDMs are identifiable and estimable is entirely unknown. We propose a very general cognitive diagnostic modeling framework for arbitrary types of multivariate responses with minimal assumptions, and establish identifiability in this general setting. Surprisingly, we prove that our general-response CDMs are identifiable under Q -matrix-based conditions similar to those for traditional categorical-response CDMs. Our conclusions set up a new paradigm of identifiable general-response CDMs. We propose an EM algorithm to efficiently estimate a broad class of exponential family-based general-response CDMs. We conduct simulation studies under various response types. The simulation results not only corroborate our identifiability theory, but also demonstrate the superior empirical performance of our estimation algorithms. We illustrate our methodology by applying it to a TIMSS 2019 response time dataset.

12.
Traffic Inj Prev ; : 1-8, 2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39083758

RESUMEN

OBJECTIVE: At unsignalized T-intersections, right turning drivers from major or minor roads have to accept or reject the available gap and cross the intersection. Poor judgment may arise risk of collision with the major road conflicting vehicles. Subject drivers intending to cross the intersection encounter wide range of gaps. Drivers accept large gaps and reject small gaps. However, drivers experience dilemma over a wide range of gaps. This study aims to examine dilemma experienced by the right turning drivers by modeling gap acceptance and rejection decisions for estimating dilemma zone boundaries on major road. METHODS: Videographic method was considered for collecting traffic movement data at unsignalized T-intersections. Traffic video data was collected at three specific unsignalized T-intersections with varying degree of channelization. After this, accepted and rejected gaps with different traffic characteristics of offending vehicles (subject drivers) and conflicting vehicles such as speed, distance, vehicle type, etc. for right turning movements were extracted from the video data to analyze gap acceptance and rejection decisions of the drivers and estimate dilemma zone boundaries. RESULTS: Gap decisions (acceptance or rejection) were modeled using Generalized Linear Model (GLM) as a function of conflicting vehicle type, speed, and distance from the intersection along with degree of channelization for major and minor roads. The results showed that gap rejection probability increased with increment in conflicting vehicle's speed and decrement in conflicting vehicle's distance from the intersection. Dilemma zone boundaries were obtained using developed GLM models by estimating the conflicting vehicle distance from the intersection for 90% and 10% gap rejection probabilities. Dilemma zone boundaries were observed to shift farther from the intersection with increase in vehicle speed and vehicle size. The analysis revealed that subject drivers experienced more dilemma while accepting a gap from major road as compared to minor road. CONCLUSIONS: This study showed that channelization plays a major role in mitigating dilemma of the subject drivers. Overall, the study identified dilemma zone boundaries for unsignalized T-intersections which may assist the right turning subject drivers to cross the intersection safely.

13.
ACS Nano ; 18(29): 19024-19037, 2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-38985736

RESUMEN

High-entropy nanomaterials exhibit exceptional mechanical, physical, and chemical properties, finding applications in many industries. Peroxidases are metalloenzymes that accelerate the decomposition of hydrogen peroxide. This study uses the high-entropy approach to generate multimetal oxide-based nanozymes with peroxidase-like activity and explores their application as sensors in ex vivo bioassays. A library of 81 materials was produced using a coprecipitation method for rapid synthesis of up to 100 variants in a single plate. The A and B sites of the magnetite structure, (AA')(BB'B'')2O4, were substituted with up to six different cations (Cu/Fe/Zn/Mg/Mn/Cr). Increasing the compositional complexity improved the catalytic performance; however, substitutions of single elements also caused drastic reductions in the peroxidase-like activity. A generalized linear model was developed describing the relationship between material composition and catalytic activity. Binary interactions between elements that acted synergistically or antagonistically were identified, and a single parameter, the mean interaction effect, was observed to correlate highly with catalytic activity, providing a valuable tool for the design of high-entropy-inspired nanozymes.


Asunto(s)
Entropía , Inmunoensayo/métodos , Óxidos/química , Catálisis , Nanoestructuras/química , Relación Estructura-Actividad , Simulación por Computador , Peróxido de Hidrógeno/química
14.
Entropy (Basel) ; 26(6)2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38920483

RESUMEN

Amid the COVID-19 pandemic, understanding the spatial and temporal dynamics of the disease is crucial for effective public health interventions. This study aims to analyze COVID-19 data in Peru using a Bayesian spatio-temporal generalized linear model to elucidate mortality patterns and assess the impact of vaccination efforts. Leveraging data from 194 provinces over 651 days, our analysis reveals heterogeneous spatial and temporal patterns in COVID-19 mortality rates. Higher vaccination coverage is associated with reduced mortality rates, emphasizing the importance of vaccination in mitigating the pandemic's impact. The findings underscore the value of spatio-temporal data analysis in understanding disease dynamics and guiding targeted public health interventions.

15.
Artículo en Inglés | MEDLINE | ID: mdl-38934649

RESUMEN

BACKGROUND: Despite the ability of cochlear implants (CIs) to provide children with access to speech, there is considerable variability in spoken language outcomes. Research aimed at identifying factors influencing speech production accuracy is needed. AIMS: To characterize the consonant production accuracy of children with cochlear implants (CWCI) and an age-matched group of children with typical hearing (CWTH) and to explore several factors that potentially affect the ability of both groups to accurately produce consonants. METHODS & PROCEDURES: We administered the Bankson-Bernthal Test of Phonology (BBTOP) to a group of 25 CWCI (mean age = 4;9, SD = 1;6, range = 3;2-8;5) implanted prior to 30 months of age with a mean duration of implant usage of 3;6 and an age-matched group of 25 CWTH (mean age = 5;0, SD = 1;6, range = 3;1-8;6). The recorded results were transcribed, and the accuracy of the target consonants was determined. Expressive vocabulary size estimates were obtained from a language sample using the number of different words (NDW). A parent questionnaire provided information about maternal education, duration of CIs experience and other demographic characteristics of each child. OUTCOMES & RESULTS: The CWCI group demonstrated some similarities to, and some differences from, their hearing peers. The CWCI demonstrated poorer consonant production accuracy overall and in various phonetic categories and word positions. However, both groups produced initial consonants more accurately than final consonants. Whilst CWCI had poorer production accuracy than CWTH for all phonetic categories (stops, nasals, fricatives, affricates, liquids and glides and consonant clusters), both groups exhibited similar error patterns across categories. For CWCI, the factors most related to consonant production accuracy when considered individually were expressive vocabulary size, followed by duration of CI experience, chronological age, maternal education and gender. The combination of maternal education and vocabulary size resulted in the best model of consonant production accuracy for this group. For the CWTH, chronological age followed by vocabulary size were most related to consonant production accuracy. No combination of factors yielded an improved model for the CWTH. CONCLUSIONS & IMPLICATIONS: Whilst group differences in production accuracy between the CWCI and CWTH were found, the pattern of errors was similar for the two groups of children, suggesting that the children are at earlier stages of overall consonant production development. Although duration of CI experience was a significant covariate in a single-variable model of consonant production accuracy for CWCI, the best multivariate model of consonant production accuracy for these children was based on the combination of expressive vocabulary size and maternal education. WHAT THIS PAPER ADDS: What is already known on the subject Research has shown that a range of factors is associated with consonant production accuracy by CWCIs, including factors such as the age at implant, duration of implant use, gender, other language skills and maternal education. Despite numerous studies that have examined speech sound production in these children, most have explored a limited number of factors that might explain the variability in scores obtained. Research that examines the potential role of a range of child-related and environmental factors in the same children is needed to determine the predictive role of these factors in speech production outcomes. What this paper adds to the existing knowledge Whilst the consonant production accuracy was lower for the CWCIs than for their typically hearing peers, there were some similarities suggesting that these children are experiencing similar, but delayed, acquisition of consonant production skills to that of their hearing peers. Whilst several factors are predictive of consonant production accuracy in children with implants, vocabulary diversity and maternal education, an indirect measure of socio-economic status, were the best combined predictors of consonant production accuracy. What are the potential or actual clinical implications of this work? Understanding the factors that shape individual differences in CWCI speech production is important for effective clinical decision-making and intervention planning. The present findings point to two potentially important factors related to speech sound production beyond the duration of robust hearing in CWCI, namely, a lexical diversity and maternal education. This suggests that intervention is likely most efficient that addresses both vocabulary development and speech sound development together. The current findings further suggest the importance of parental involvement and commitment to spoken language development and the importance of receiving early and consistent intervention aimed both at skill development and parental efficacy.

16.
Injury ; 55(8): 111702, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38936227

RESUMEN

BACKGROUND: Given the huge impact of trauma on hospital systems around the world, several attempts have been made to develop predictive models for the outcomes of trauma victims. The most used, and in many studies most accurate predictive model, is the "Trauma Score and Injury Severity Score" (TRISS). Although it has proven to be fairly accurate and is widely used, it has faced criticism for its inability to classify more complex cases. In this study, we aimed to develop machine learning models that better than TRISS could predict mortality among severely injured trauma patients, something that has not been studied using data from a nationwide register before. METHODS: Patient data was collected from the national trauma register in Sweden, SweTrau. The studied period was from the 1st of January 2015 to 31st of December 2019. After feature selection and multiple imputation of missing data three machine learning (ML) methods (Random Forest, eXtreme Gradient Boosting, and a Generalized Linear Model) were used to create predictive models. The ML models and TRISS were then tested on predictive ability for 30-day mortality. RESULTS: The ML models were well-calibrated and outperformed TRISS in all the tested measurements. Among the ML models, the eXtreme Gradient Boosting model performed best with an AUC of 0.91 (0.88-0.93). CONCLUSION: This study showed that all the developed ML-based prediction models were superior to TRISS for the prediction of trauma mortality.


Asunto(s)
Puntaje de Gravedad del Traumatismo , Aprendizaje Automático , Sistema de Registros , Heridas y Lesiones , Humanos , Suecia/epidemiología , Masculino , Heridas y Lesiones/mortalidad , Femenino , Persona de Mediana Edad , Adulto , Valor Predictivo de las Pruebas , Anciano , Índices de Gravedad del Trauma
17.
Test (Madr) ; 33(2): 589-608, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38868722

RESUMEN

Generalized linear models (GLMs) are very widely used, but formal goodness-of-fit (GOF) tests for the overall fit of the model seem to be in wide use only for certain classes of GLMs. We develop and apply a new goodness-of-fit test, similar to the well-known and commonly used Hosmer-Lemeshow (HL) test, that can be used with a wide variety of GLMs. The test statistic is a variant of the HL statistic, but we rigorously derive an asymptotically correct sampling distribution using methods of Stute and Zhu (Scand J Stat 29(3):535-545, 2002) and demonstrate its consistency. We compare the performance of our new test with other GOF tests for GLMs, including a naive direct application of the HL test to the Poisson problem. Our test provides competitive or comparable power in various simulation settings and we identify a situation where a naive version of the test fails to hold its size. Our generalized HL test is straightforward to implement and interpret and an R package is publicly available. Supplementary Information: The online version contains supplementary material available at 10.1007/s11749-023-00912-8.

18.
J Korean Med Sci ; 39(22): e176, 2024 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-38859739

RESUMEN

BACKGROUND: Malaria elimination strategies in the Republic of Korea (ROK) have decreased malaria incidence but face challenges due to delayed case detection and response. To improve this, machine learning models for predicting malaria, focusing on high-risk areas, have been developed. METHODS: The study targeted the northern region of ROK, near the demilitarized zone, using a 1-km grid to identify areas for prediction. Grid cells without residential buildings were excluded, leaving 8,425 cells. The prediction was based on whether at least one malaria case was reported in each grid cell per month, using spatial data of patient locations. Four algorithms were used: gradient boosted (GBM), generalized linear (GLM), extreme gradient boosted (XGB), and ensemble models, incorporating environmental, sociodemographic, and meteorological data as predictors. The models were trained with data from May to October (2019-2021) and tested with data from May to October 2022. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). RESULTS: The AUROC of the prediction models performed excellently (GBM = 0.9243, GLM = 0.9060, XGB = 0.9180, and ensemble model = 0.9301). Previous malaria risk, population size, and meteorological factors influenced the model most in GBM and XGB. CONCLUSION: Machine-learning models with properly preprocessed malaria case data can provide reliable predictions. Additional predictors, such as mosquito density, should be included in future studies to improve the performance of models.


Asunto(s)
Aprendizaje Automático , Malaria Vivax , Plasmodium vivax , Curva ROC , República de Corea/epidemiología , Humanos , Malaria Vivax/epidemiología , Plasmodium vivax/aislamiento & purificación , Algoritmos , Área Bajo la Curva , Incidencia , Factores de Riesgo
19.
J Appl Stat ; 51(7): 1399-1411, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38835824

RESUMEN

The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type I error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates (multiple Bernoulli trials) are present in the data. We demonstrate that a generalized version of the HL test (GHL) presented in previous work can offer some protection against this power loss. These results are also supported by application of both the HL and GHL test to a real-life data set. We conclude with a brief discussion explaining the behavior of the HL test, along with some guidance on how to choose between the two tests. In particular, we suggest the GHL test to be used when there are binary replicates or clusters in the covariate space, provided that the sample size is sufficiently large.

20.
Front Comput Neurosci ; 18: 1392655, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38841426

RESUMEN

Introduction: Cross frequency coupling (CFC) between electrophysiological signals in the brain is a long-studied phenomenon and its abnormalities have been observed in conditions such as Parkinson's disease and epilepsy. More recently, CFC has been observed in stomach-brain electrophysiologic studies and thus becomes an enticing possible target for diseases involving aberrations of the gut-brain axis. However, current methods of detecting coupling, specifically phase-amplitude coupling (PAC), do not attempt to capture the phase and amplitude statistical relationships. Methods: In this paper, we first demonstrate a method of modeling these joint statistics with a flexible parametric approach, where we model the conditional distribution of amplitude given phase using a gamma distributed generalized linear model (GLM) with a Fourier basis of regressors. We perform model selection with minimum description length (MDL) principle, demonstrate a method for assessing goodness-of-fit (GOF), and showcase the efficacy of this approach in multiple electroencephalography (EEG) datasets. Secondly, we showcase how we can utilize the mutual information, which operates on the joint distribution, as a canonical measure of coupling, as it is non-zero and non-negative if and only if the phase and amplitude are not statistically independent. In addition, we build off of previous work by Martinez-Cancino et al., and Voytek et al., and show that the information density, evaluated using our method along the given sample path, is a promising measure of time-resolved PAC. Results: Using synthetically generated gut-brain coupled signals, we demonstrate that our method outperforms the existing gold-standard methods for detectable low-levels of phase-amplitude coupling through receiver operating characteristic (ROC) curve analysis. To validate our method, we test on invasive EEG recordings by generating comodulograms, and compare our method to the gold standard PAC measure, Modulation Index, demonstrating comparable performance in exploratory analysis. Furthermore, to showcase its use in joint gut-brain electrophysiology data, we generate topoplots of simultaneous high-density EEG and electrgastrography recordings and reproduce seminal work by Richter et al. that demonstrated the existence of gut-brain PAC. Using simulated data, we validate our method for different types of time-varying coupling and then demonstrate its performance to track time-varying PAC in sleep spindle EEG and mismatch negativity (MMN) datasets. Conclusions: Our new measure of PAC using Gamma GLMs and mutual information demonstrates a promising new way to compute PAC values using the full joint distribution on amplitude and phase. Our measure outperforms the most common existing measures of PAC, and show promising results in identifying time varying PAC in electrophysiological datasets. In addition, we provide for using our method with multiple comparisons and show that our measure potentially has more statistical power in electrophysiologic recordings using simultaneous gut-brain datasets.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA