Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 372
Filtrar
1.
Sci Rep ; 14(1): 20919, 2024 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-39251695

RESUMEN

The primary purpose of this article is to examine the issue of estimating the finite population distribution function from auxiliary information, such as population mean and rank of the auxiliary variables, that are already known. In order to better estimate the distribution function (DF) of a finite population, two improved estimators are developed. The bias and mean squared error of the suggested and existing estimators are derived up to the first order of approximation. To improve the efficiency of an estimators, we compare the suggested estimators with existing counterpart. Based on the numerical outcomes, it is to be noted that the suggested classes of estimators perform well using six actual data sets. The strength and generalization of the suggested estimators are also verified using a simulation analysis. Based on the result of actual data sets and a simulation study, we observe that the suggested estimator outperforms as compared to all existing estimators which is compared in this study.


Asunto(s)
Modelos Estadísticos , Simulación por Computador , Humanos , Algoritmos
2.
Stat Med ; 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39233370

RESUMEN

Many clinical trials involve partially clustered data, where some observations belong to a cluster and others can be considered independent. For example, neonatal trials may include infants from single or multiple births. Sample size and analysis methods for these trials have received limited attention. A simulation study was conducted to (1) assess whether existing power formulas based on generalized estimating equations (GEEs) provide an adequate approximation to the power achieved by mixed effects models, and (2) compare the performance of mixed models vs GEEs in estimating the effect of treatment on a continuous outcome. We considered clusters that exist prior to randomization with a maximum cluster size of 2, three methods of randomizing the clustered observations, and simulated datasets with uninformative cluster size and the sample size required to achieve 80% power according to GEE-based formulas with an independence or exchangeable working correlation structure. The empirical power of the mixed model approach was close to the nominal level when sample size was calculated using the exchangeable GEE formula, but was often too high when the sample size was based on the independence GEE formula. The independence GEE always converged and performed well in all scenarios. Performance of the exchangeable GEE and mixed model was also acceptable under cluster randomization, though under-coverage and inflated type I error rates could occur with other methods of randomization. Analysis of partially clustered trials using GEEs with an independence working correlation structure may be preferred to avoid the limitations of mixed models and exchangeable GEEs.

3.
J Nutr Health Aging ; 28(10): 100361, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39276626

RESUMEN

BACKGROUND: A more sustainable diet with fewer animal-based products has a lower ecological impact but might lead to a lower protein quantity and quality. The extent to which shifting to more plant-based diets impacts the adequacy of protein intake in older adults needs to be studied. OBJECTIVES: We simulated how a transition towards a more plant-based diet (flexitarian, pescetarian, vegetarian, or vegan) affects protein availability in the diets of older adults. SETTING: Community. PARTICIPANTS: Data from the Dutch National Food Consumption Survey 2019-2021 of community-dwelling older adults (n = 607) was used MEASUREMENTS: Food consumption data was collected via two 24 -h dietary recalls per participant. Protein availability was expressed as total protein, digestible protein, and utilizable protein (based on digestibility corrected amino acid score) intake. The percentage below estimated average requirements (EAR) for utilizable protein was assessed using an adjusted EAR. RESULTS: Compared to the original diet (∼62% animal-based), utilizable protein intake decreased by about 5% in the flexitarian, pescetarian and vegetarian scenarios. In the vegan scenario, both total protein intake and utilizable protein were lower, leading to nearly 50% less utilizable protein compared to the original diet. In the original diet, the protein intake of 7.5% of men and 11.1% of women did not meet the EAR. This slightly increased in the flexitarian, pescetarian, and vegetarian scenarios. In the vegan scenario, 83.3% (both genders) had a protein intake below EAR. CONCLUSIONS: Replacing animal-based protein sources with plant-based food products in older adults reduces both protein quantity and quality, albeit minimally in non-vegan plant-rich diets. In a vegan scenario, the risk of an inadequate protein intake is imminent.

4.
Ecol Evol ; 14(9): e70230, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39234160

RESUMEN

Abundance estimation is frequently an objective of conservation and monitoring initiatives for threatened and other managed populations. While abundance estimation via capture-mark-recapture or spatially explicit capture-recapture is now common, such approaches are logistically challenging and expensive for species such as boreal caribou (Rangifer tarandus), which inhabit remote regions, are widely dispersed, and exist at low densities. Fortunately, the recently developed 'close-kin mark-recapture' (CKMR) framework, which uses the number of kin pairs obtained within a sample to generate an abundance estimate, eliminates the need for multiple sampling events. As a result, some caribou managers are interested in using this method to generate an abundance estimate from a single, non-invasive sampling event for caribou populations. We conducted a simulation study using realistic boreal caribou demographic rates and population sizes to assess how population size and the proportion of the population surveyed impact the accuracy and precision of single-survey CKMR-based abundance estimates. Our results indicated that abundance estimates were biased and highly imprecise when very small proportions of the population were sampled, regardless of the population size. However, the larger the population size, the smaller the required proportion of the population surveyed to generate both accurate and reasonably precise estimates. Additionally, we also present a case study in which we used the CKMR framework to generate annual female abundance estimates for a small caribou population in Jasper National Park, Alberta, Canada, from 2006 to 2015 and compared them to existing published capture-mark-recapture-based estimates. Both the accuracy and precision of the annual CKMR-based abundance estimates varied across years and were sensitive to the proportion of pairwise kinship comparisons which yielded a mother-offspring pair. Taken together, our study demonstrates that it is possible to generate CKMR-based abundance estimates from a single sampling event for small caribou populations, so long as a sufficient sampling intensity can be achieved.

5.
Sci Rep ; 14(1): 18027, 2024 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-39098844

RESUMEN

Ranked set sampling (RSS) is known to increase the efficiency of the estimators while comparing it with simple random sampling. The problem of missingness creates a gap in the information that needs to be addressed before proceeding for estimation. Negligible amount of work has been carried out to deal with missingness utilizing RSS. This paper proposes some logarithmic type methods of imputation for the estimation of population mean under RSS using auxiliary information. The properties of the suggested imputation procedures are examined. A simulation study is accomplished to show that the proposed imputation procedures exhibit better results in comparison to some of the existing imputation procedures. Few real applications of the proposed imputation procedures is also provided to generalize the simulation study.

6.
Linacre Q ; 91(3): 315-328, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39104463

RESUMEN

Fertility awareness-based methods (FABMs), also known as natural family planning (NFP), enable couples to identify the days of the menstrual cycle when intercourse may result in pregnancy ("fertile days"), and to avoid intercourse on fertile days if they wish to avoid pregnancy. Thus, these methods are fully dependent on user behavior for effectiveness to avoid pregnancy. For couples and clinicians considering the use of an FABM, one important metric to consider is the highest expected effectiveness (lowest possible pregnancy rate) during the correct use of the method to avoid pregnancy. To assess this, most studies of FABMs have reported a method-related pregnancy rate (a cumulative proportion), which is calculated based on all cycles (or months) in the study. In contrast, the correct use to avoid pregnancy rate (also a cumulative proportion) has the denominator of cycles with the correct use of the FABM to avoid pregnancy. The relationship between these measures has not been evaluated quantitatively. We conducted a series of simulations demonstrating that the method-related pregnancy rate is artificially decreased in direct proportion to the proportion of cycles with intermediate use (any use other than correct use to avoid or targeted use to conceive), which also increases the total pregnancy rate. Thus, as the total pregnancy rate rises (related to intermediate use), the method-related pregnancy rate falls artificially while the correct use pregnancy rate remains constant. For practical application, we propose the core elements needed to assess correct use cycles in FABM studies. Summary: Fertility awareness-based methods (FABMs) can be used by couples to avoid pregnancy, by avoiding intercourse on fertile days. Users want to know what the highest effectiveness (lowest pregnancy rate) would be if they use an FABM correctly and consistently to avoid pregnancy. In this simulation study, we compare two different measures: (1) the method-related pregnancy rate; and (2) the correct use pregnancy rate. We show that the method-related pregnancy rate is biased too low if some users in the study are not using the method consistently to avoid pregnancy, while the correct use pregnancy rate obtains an accurate estimate. Short Summary: In FABM studies, the method-related pregnancy rate is biased too low, but the correct use pregnancy rate is unbiased.

7.
IEEE Trans Comput Imaging ; 10: 69-82, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39184532

RESUMEN

Ultrasound computed tomography (USCT) is an emerging imaging modality that holds great promise for breast imaging. Full-waveform inversion (FWI)-based image reconstruction methods incorporate accurate wave physics to produce high spatial resolution quantitative images of speed of sound or other acoustic properties of the breast tissues from USCT measurement data. However, the high computational cost of FWI reconstruction represents a significant burden for its widespread application in a clinical setting. The research reported here investigates the use of a convolutional neural network (CNN) to learn a mapping from USCT waveform data to speed of sound estimates. The CNN was trained using a supervised approach with a task-informed loss function aiming at preserving features of the image that are relevant to the detection of lesions. A large set of anatomically and physiologically realistic numerical breast phantoms (NBPs) and corresponding simulated USCT measurements was employed during training. Once trained, the CNN can perform real-time FWI image reconstruction from USCT waveform data. The performance of the proposed method was assessed and compared against FWI using a hold-out sample of 41 NBPs and corresponding USCT data. Accuracy was measured using relative mean square error (RMSE), structural self-similarity index measure (SSIM), and lesion detection performance (DICE score). This numerical experiment demonstrates that a supervised learning model can achieve accuracy comparable to FWI in terms of RMSE and SSIM, and better performance in terms of task performance, while significantly reducing computational time.

8.
Am J Epidemiol ; 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39160637

RESUMEN

The test-negative design (TND) is a popular method for evaluating vaccine effectiveness (VE). A "classical" TND study includes symptomatic individuals tested for the disease targeted by the vaccine to estimate VE against symptomatic infection. However, recent applications of the TND have attempted to estimate VE against infection by including all tested individuals, regardless of their symptoms. In this article, we use directed acyclic graphs and simulations to investigate potential biases in TND studies of COVID-19 VE arising from the use of this "alternative" approach, particularly when applied during periods of widespread testing. We show that the inclusion of asymptomatic individuals can potentially lead to collider stratification bias, uncontrolled confounding by health and healthcare-seeking behaviors (HSBs), and differential outcome misclassification. While our focus is on the COVID-19 setting, the issues discussed here may also be relevant in the context of other infectious diseases. This may be particularly true in scenarios where there is either a high baseline prevalence of infection, a strong correlation between HSBs and vaccination, different testing practices for vaccinated and unvaccinated individuals, or settings where both the vaccine under study attenuates symptoms of infection and diagnostic accuracy is modified by the presence of symptoms.

9.
BMC Med Res Methodol ; 24(1): 188, 2024 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-39198744

RESUMEN

BACKGROUND AND OBJECTIVES: Comprehending the research dataset is crucial for obtaining reliable and valid outcomes. Health analysts must have a deep comprehension of the data being analyzed. This comprehension allows them to suggest practical solutions for handling missing data, in a clinical data source. Accurate handling of missing values is critical for producing precise estimates and making informed decisions, especially in crucial areas like clinical research. With data's increasing diversity and complexity, numerous scholars have developed a range of imputation techniques. To address this, we conducted a systematic review to introduce various imputation techniques based on tabular dataset characteristics, including the mechanism, pattern, and ratio of missingness, to identify the most appropriate imputation methods in the healthcare field. MATERIALS AND METHODS: We searched four information databases namely PubMed, Web of Science, Scopus, and IEEE Xplore, for articles published up to September 20, 2023, that discussed imputation methods for addressing missing values in a clinically structured dataset. Our investigation of selected articles focused on four key aspects: the mechanism, pattern, ratio of missingness, and various imputation strategies. By synthesizing insights from these perspectives, we constructed an evidence map to recommend suitable imputation methods for handling missing values in a tabular dataset. RESULTS: Out of 2955 articles, 58 were included in the analysis. The findings from the development of the evidence map, based on the structure of the missing values and the types of imputation methods used in the extracted items from these studies, revealed that 45% of the studies employed conventional statistical methods, 31% utilized machine learning and deep learning methods, and 24% applied hybrid imputation techniques for handling missing values. CONCLUSION: Considering the structure and characteristics of missing values in a clinical dataset is essential for choosing the most appropriate data imputation technique, especially within conventional statistical methods. Accurately estimating missing values to reflect reality enhances the likelihood of obtaining high-quality and reusable data, contributing significantly to precise medical decision-making processes. Performing this review study creates a guideline for choosing the most appropriate imputation methods in data preprocessing stages to perform analytical processes on structured clinical datasets.


Asunto(s)
Investigación Biomédica , Humanos , Interpretación Estadística de Datos , Investigación Biomédica/métodos , Investigación Biomédica/normas , Investigación Biomédica/estadística & datos numéricos , Conjuntos de Datos como Asunto
10.
Sci Total Environ ; 951: 175687, 2024 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-39173773

RESUMEN

BACKGROUND: Wastewater monitoring data can be used to estimate disease trends to inform public health responses. One commonly estimated metric is the rate of change in pathogen quantity, which typically correlates with clinical surveillance in retrospective analyses. However, the accuracy of rate of change estimation approaches has not previously been evaluated. OBJECTIVES: We assessed the performance of approaches for estimating rates of change in wastewater pathogen loads by generating synthetic wastewater time series data for which rates of change were known. Each approach was also evaluated on real-world data. METHODS: Smooth trends and their first derivatives were jointly sampled from Gaussian processes (GP) and independent errors were added to generate synthetic viral load measurements; the range hyperparameter and error variance were varied to produce nine simulation scenarios representing different potential disease patterns. The directions and magnitudes of the rate of change estimates from four estimation approaches (two established and two developed in this work) were compared to the GP first derivative to evaluate classification and quantitative accuracy. Each approach was also implemented for public SARS-CoV-2 wastewater monitoring data collected January 2021-May 2023 at 25 sites in North Carolina, USA. RESULTS: All four approaches inconsistently identified the correct direction of the trend given by the sign of the GP first derivative. Across all nine simulated disease patterns, between a quarter and a half of all estimates indicated the wrong trend direction, regardless of estimation approach. The proportion of trends classified as plateaus (statistically indistinguishable from zero) for the North Carolina SARS-CoV-2 data varied considerably by estimation method but not by site. DISCUSSION: Our results suggest that wastewater measurements alone might not provide sufficient data to reliably track disease trends in real-time. Instead, wastewater viral loads could be combined with additional public health surveillance data to improve predictions of other outcomes.


Asunto(s)
Aguas Residuales , Aguas Residuales/virología , COVID-19/epidemiología , North Carolina/epidemiología , Humanos , Monitoreo del Ambiente/métodos , SARS-CoV-2 , Carga Viral , Monitoreo Epidemiológico Basado en Aguas Residuales
11.
Biom J ; 66(6): e202300271, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39132909

RESUMEN

Many clinical trials assess time-to-event endpoints. To describe the difference between groups in terms of time to event, we often employ hazard ratios. However, the hazard ratio is only informative in the case of proportional hazards (PHs) over time. There exist many other effect measures that do not require PHs. One of them is the average hazard ratio (AHR). Its core idea is to utilize a time-dependent weighting function that accounts for time variation. Though propagated in methodological research papers, the AHR is rarely used in practice. To facilitate its application, we unfold approaches for sample size calculation of an AHR test. We assess the reliability of the sample size calculation by extensive simulation studies covering various survival and censoring distributions with proportional as well as nonproportional hazards (N-PHs). The findings suggest that a simulation-based sample size calculation approach can be useful for designing clinical trials with N-PHs. Using the AHR can result in increased statistical power to detect differences between groups with more efficient sample sizes.


Asunto(s)
Modelos de Riesgos Proporcionales , Tamaño de la Muestra , Humanos , Ensayos Clínicos como Asunto , Biometría/métodos
12.
Heliyon ; 10(12): e32011, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-39183875

RESUMEN

This article proposes and discusses a novel approach for generating trigonometric G-families using hybrid generalizers of distributions. The proposed generalizer is constructed by utilizing the tangent trigonometric function and distribution function of base model G ( x ) . The newly proposed family of uni-variate continuous distributions is named the "Lomax Tangent Generalized Family of Distributions (LT-G)" and structural-mathematical-statistical properties are derived. Some special and sub-models of the proposed family are also presented. A Weibull-based model, 'The Lomax Tangent Weibull (LT-W) Distribution," is discussed and the plots of density (pdf) and hazard (hrf) functions are also explained. Model parameter estimates are estimated by employing the maximum likelihood estimation (MLE) procedure. The accuracy of the MLEs is evaluated through Monte Carlo simulation. Last but not least, to demonstrate the flexibility and potential of the proposed distribution, two actual hydrological and strength data sets are analyzed. The obtained results are compared with well-known, competitive, and related existing distributions.

13.
Artículo en Inglés | MEDLINE | ID: mdl-39006765

RESUMEN

Because the conventional binormal ROC curve parameters are in terms of the underlying normal diseased and nondiseased rating distributions, transformations of these values are required for the user to understand what the corresponding ROC curve looks like in terms of its shape and size. In this paper I propose an alternative parameterization in terms of parameters that explicitly describe the shape and size of the ROC curve. The proposed two parameters are the mean-to-sigma ratio and the familiar area under the ROC curve (AUC), which are easily interpreted in terms of the shape and size of the ROC curve, respectively. In addition, the mean-to-sigma ratio describes the degree of improperness of the ROC curve and the AUC describes the ability of the corresponding diagnostic test to discriminate between diseased and nondiseased cases. The proposed parameterization simplifies the sizing of diagnostic studies when conjectured variance components are used and simplifies choosing the binormal a and b parameter values needed for simulation studies.

14.
Heliyon ; 10(12): e32203, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-38975167

RESUMEN

Probability distributions are widely utilized in applied sciences, especially in the field of biomedical science. Biomedical data typically exhibit positive skewness, necessitating the use of flexible, skewed distributions to effectively model such phenomena. In this study, we introduce a novel approach to characterize new lifetime distributions, known as the New Flexible Exponent Power (NFEP) Family of distributions. This involves the addition of a new parameter to existing distributions. A specific sub-model within the proposed class, known as the New Flexible Exponent Power Weibull (NFEP-Wei), is derived to illustrate the concept of flexibility. We employ the well-established Maximum Likelihood Estimation (MLE) method to estimate the unknown parameters in this family of distributions. A simulation study is conducted to assess the behavior of the estimators in various scenarios. To gauge the flexibility and effectiveness of the NFEP-Wei distribution, we compare it with the AP-Wei (alpha power Weibull), MO-Wei (Marshal Olkin Weibull), classical Wei (Weibull), NEP-Wei (new exponent power Weibull), FRLog-Wei (flexible reduced logarithmic Weibull), and Kum-Wei (Kumaraswamy Weibull) distributions by analyzing four distinct biomedical datasets. The results demonstrate that the NFEP-Wei distribution outperforms the compared distributions.

15.
Spat Spatiotemporal Epidemiol ; 49: 100654, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38876557

RESUMEN

BACKGROUND: Spatial modeling of disease risk using primary care registry data is promising for public health surveillance. However, it remains unclear to which extent challenges such as spatially disproportionate sampling and practice-specific reporting variation affect statistical inference. METHODS: Using lower respiratory tract infection data from the INTEGO registry, modeled with a logistic model incorporating patient characteristics, a spatially structured random effect at municipality level, and an unstructured random effect at practice level, we conducted a case and simulation study to assess the impact of these challenges on spatial trend estimation. RESULTS: Even with spatial imbalance and practice-specific reporting variation, the model performed well. Performance improved with increasing spatial sample balance and decreasing practice-specific variation. CONCLUSION: Our findings indicate that, with correction for reporting efforts, primary care registries are valuable for spatial trend estimation. The diversity of patient locations within practice populations plays an important role.


Asunto(s)
Atención Primaria de Salud , Sistema de Registros , Humanos , Atención Primaria de Salud/estadística & datos numéricos , Masculino , Femenino , Adulto , Persona de Mediana Edad , Análisis Espacial , Infecciones del Sistema Respiratorio/epidemiología , Anciano , Adolescente , Modelos Logísticos , Niño , Modelos Estadísticos , Adulto Joven , Preescolar
16.
J Cardiovasc Dev Dis ; 11(6)2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38921675

RESUMEN

In recent years, the prevalence of and mortality associated with cardiovascular diseases have been rising in most countries and regions. AF is the most common arrhythmic condition, and there are several treatment options for AF. Pulmonary vein isolation is an effective treatment for AF and is the cornerstone of current ablation techniques, which have one major limitation: even when diagnosed and treated at a facility that specializes in ablation, patients have a greater chance of recurrence. Therefore, there is a need to develop better ablation techniques for the treatment of AF. This article first compares the current cryoablation (CBA) and radiofrequency ablation (RFA) techniques for the treatment of AF and discusses the utility and advantages of the development of pulsed-field ablation (PFA) technology. The current research on PFA is summarized from three perspectives, namely, simulation experiments, animal experiments, and clinical studies. The results of different stages of experiments are summarized, especially during animal studies, where pulmonary vein isolation was carried out effectively without causing injury to the phrenic nerve, esophagus, and pulmonary veins, with higher safety and shorter incision times. This paper focuses on a review of various a priori and clinical studies of this new technique for the treatment of AF.

17.
Materials (Basel) ; 17(12)2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38930362

RESUMEN

In recent years, the variability in the composition of cement raw materials has increasingly impacted the quality of cement products. However, there has been relatively little research on the homogenization effects of equipment in the cement production process. Existing studies mainly focus on the primary functions of equipment, such as the grinding efficiency of ball mills, the thermal decomposition in cyclone preheaters, and the thermal decomposition in rotary kilns. This study selected four typical pieces of equipment with significant homogenization functions for an in-depth investigation: ball mills, pneumatic homogenizing silos, cyclone preheaters, and rotary kilns. To assess the homogenization efficacy of each apparatus, scaled-down models of these devices were constructed and subjected to simulated experiments. To improve experimental efficiency and realistically simulate actual production conditions in a laboratory setting, this study used the uniformity of the electrical capacitance of mixed powders instead of compositional uniformity to analyze homogenization effects. The test material in the experiment consisted of a mixture of raw meal from a cement factory with a high dielectric constant and Fe3O4 powder. The parallel plate capacitance method was employed to ascertain the capacitance value of the mixed powder prior to and subsequent to treatment by each equipment model. The fluctuation of the input and output curves was analyzed, and the standard deviation (S), coefficient of variation (R), and homogenization multiplier (H) were calculated in order to evaluate the homogenization effect of each equipment model on the raw meal. The findings of the study indicated that the pneumatic homogenizer exhibited an exemplary homogenization effect, followed by the ball mill. For the ball mill, a higher proportion of small balls in the gradation can significantly enhance the homogenization effect without considering the grinding efficiency. The five-stage cyclone preheater also has a better homogenization effect, while the rotary kiln has a less significant homogenization effect on raw meal. Finally, the raw meal processed by each equipment model was used for clinker calcination and the preparation of cement mortar samples. After curing for three days, the compressive and flexural strengths of the samples were tested, thereby indirectly verifying the homogenization effect of each equipment model on the raw meal. This study helps to understand the homogenization process of raw materials by equipment in cement production and provides certain reference and data support for equipment selection, operation optimization, and quality control in the cement production process.

18.
Biom J ; 66(4): e2200334, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38747086

RESUMEN

Many data sets exhibit a natural group structure due to contextual similarities or high correlations of variables, such as lipid markers that are interrelated based on biochemical principles. Knowledge of such groupings can be used through bi-level selection methods to identify relevant feature groups and highlight their predictive members. One of the best known approaches of this kind combines the classical Least Absolute Shrinkage and Selection Operator (LASSO) with the Group LASSO, resulting in the Sparse Group LASSO. We propose the Sparse Group Penalty (SGP) framework, which allows for a flexible combination of different SGL-style shrinkage conditions. Analogous to SGL, we investigated the combination of the Smoothly Clipped Absolute Deviation (SCAD), the Minimax Concave Penalty (MCP) and the Exponential Penalty (EP) with their group versions, resulting in the Sparse Group SCAD, the Sparse Group MCP, and the novel Sparse Group EP (SGE). Those shrinkage operators provide refined control of the effect of group formation on the selection process through a tuning parameter. In simulation studies, SGPs were compared with other bi-level selection methods (Group Bridge, composite MCP, and Group Exponential LASSO) for variable and group selection evaluated with the Matthews correlation coefficient. We demonstrated the advantages of the new SGE in identifying parsimonious models, but also identified scenarios that highlight the limitations of the approach. The performance of the techniques was further investigated in a real-world use case for the selection of regulated lipids in a randomized clinical trial.


Asunto(s)
Biometría , Biometría/métodos , Humanos
19.
Front Genet ; 15: 1203577, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38818035

RESUMEN

Cross-sectional data allow the investigation of how genetics influence health at a single time point, but to understand how the genome impacts phenotype development, one must use repeated measures data. Ignoring the dependency inherent in repeated measures can exacerbate false positives and requires the utilization of methods other than general or generalized linear models. Many methods can accommodate longitudinal data, including the commonly used linear mixed model and generalized estimating equation, as well as the less popular fixed-effects model, cluster-robust standard error adjustment, and aggregate regression. We simulated longitudinal data and applied these five methods alongside naïve linear regression, which ignored the dependency and served as a baseline, to compare their power, false positive rate, estimation accuracy, and precision. The results showed that the naïve linear regression and fixed-effects models incurred high false positive rates when analyzing a predictor that is fixed over time, making them unviable for studying time-invariant genetic effects. The linear mixed models maintained low false positive rates and unbiased estimation. The generalized estimating equation was similar to the former in terms of power and estimation, but it had increased false positives when the sample size was low, as did cluster-robust standard error adjustment. Aggregate regression produced biased estimates when predictor effects varied over time. To show how the method choice affects downstream results, we performed longitudinal analyses in an adolescent cohort of African and European ancestry. We examined how developing post-traumatic stress symptoms were predicted by polygenic risk, traumatic events, exposure to sexual abuse, and income using four approaches-linear mixed models, generalized estimating equations, cluster-robust standard error adjustment, and aggregate regression. While the directions of effect were generally consistent, coefficient magnitudes and statistical significance differed across methods. Our in-depth comparison of longitudinal methods showed that linear mixed models and generalized estimating equations were applicable in most scenarios requiring longitudinal modeling, but no approach produced identical results even if fit to the same data. Since result discrepancies can result from methodological choices, it is crucial that researchers determine their model a priori, refrain from testing multiple approaches to obtain favorable results, and utilize as similar as possible methods when seeking to replicate results.

20.
Front Psychol ; 15: 1359111, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38770253

RESUMEN

In the social sciences, accurately identifying the dimensionality of measurement scales is crucial for understanding latent constructs such as anxiety, happiness, and self-efficacy. This study presents a rigorous comparison between Parallel Analysis (PA) and Exploratory Graph Analysis (EGA) for assessing the dimensionality of scales, particularly focusing on ordinal data. Through an extensive simulation study, we evaluated the effectiveness of these methods under various conditions, including varying sample size, number of factors and their association, patterns of loading magnitudes, and symmetrical or skewed item distributions with assumed underlying normality or non-normality. Results show that the performance of each method varies across different scenarios, depending on the context. EGA consistently outperforms PA in correctly identifying the number of factors, particularly in complex scenarios characterized by more than a single factor, high inter-factor correlations and low to medium primary loadings. However, for datasets with simpler and stronger factor structures, specifically those with a single factor, high primary loadings, low cross-loadings, and low to moderate interfactor correlations, PA is suggested as the method of choice. Skewed item distributions with assumed underlying normality or non-normality were found to noticeably impact the performance of both methods, particularly in complex scenarios. The results provide valuable insights for researchers utilizing these methods in scale development and validation, ensuring that measurement instruments accurately reflect theoretical constructs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA