Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Stat Med ; 2024 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-39285135

RESUMEN

The agreement intra-class correlation coefficient (ICCa) is a suitable statistical index for inter-rater reliability studies. With balanced Gaussian data, we prove the explicit form of ICCa asymptotic normality (ASN), valid both with analysis of variance (ANOVA), maximum likelihood (ML), or restricted ML (REML) estimates. An asymptotic confidence interval is then derived and its performances are examined by simulation compared to the most commonly used methods, under small, moderate and large sample size designs. Then, we deduce sample size calculation formulas, for the number of subjects and observers needed, to achieve a desired confidence interval width or an acceptable ICCa value test power and give concrete examples of their use. Finally, we propose a likelihood ratio test (LRT) to compare two ICCa's from two distinct subpopulations of patients (or raters) and study by simulation its first order risk and power properties. These methods are illustrated using data from two inter-rater reliability studies, one in physiotherapy with 42 patients and 10 raters and the second in neonatology with 80 subjects and 14 raters. In conclusion, we made recommendations to employ the proposed confidence interval for medium to large samples combined with the quantification of the minimal required sample size at the planning step, or the posterior-power at the analysis step, using simple dedicated formulas. Furthermore, with sufficient sizes, the proposed LRT seems suitable to compare inter-rater reliability between two patient subpopulations. Used wisely, this proposed methods toolbox can remedy common current issues in inter-rater reliability studies.

2.
J Appl Stat ; 51(10): 1961-1975, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39071255

RESUMEN

The main aim of this work is to develop a new goodness-of-fit test for the one-sided Lévy distribution. The proposed test is based on the scale-ratio approach in which two estimators of the scale parameter of one-sided Lévy distribution are confronted. The asymptotic distribution of the test statistic is obtained under null hypotheses. The performance of the test is demonstrated using simulated observations from various known distributions. Finally, two real-world datasets are analyzed.

3.
J Appl Stat ; 51(10): 1894-1918, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39071249

RESUMEN

In this article, we define mixed predictor and stochastic restricted ridge predictor of partially linear mixed measurement error models by taking advantage of Kernel approximation. Under matrix mean square error criterion, we make the comparison of the superiorities the linear combinations of the new defined predictors. Then we investigate the asymptotic normality characteristics and the situation of the unknown covariance matrix of measurement errors. Finally, the study is ended with a Monte Carlo simulation study and COVID-19 data application.

4.
Artículo en Inglés | MEDLINE | ID: mdl-38676427

RESUMEN

Pairwise likelihood is a limited-information method widely used to estimate latent variable models, including factor analysis of categorical data. It can often avoid evaluating high-dimensional integrals and, thus, is computationally more efficient than relying on the full likelihood. Despite its computational advantage, the pairwise likelihood approach can still be demanding for large-scale problems that involve many observed variables. We tackle this challenge by employing an approximation of the pairwise likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. The stochastic gradients are constructed by subsampling the pairwise log-likelihood contributions, for which the subsampling scheme controls the per-iteration computational complexity. The stochastic estimator is shown to be asymptotically equivalent to the pairwise likelihood one. However, finite-sample performance can be improved by compounding the sampling variability of the data with the uncertainty introduced by the subsampling scheme. We demonstrate the performance of the proposed method using simulation studies and two real data applications.

5.
Microscopy (Oxf) ; 2023 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-37986580

RESUMEN

Quantifying the number of molecules from fluorescence microscopy measurements is an important topic in cell biology and medical research. In this work, we present a consecutive algorithm for super-resolution (stimulated emission depletion (STED)) scanning microscopy that provides molecule counts in automatically generated image segments and offers statistical guarantees in form of asymptotic confidence intervals. To this end, we first apply a multiscale scanning procedure on STED microscopy measurements of the sample to obtain a system of significant regions, each of which contains at least one molecule with prescribed uniform probability. This system of regions will typically be highly redundant and consists of rectangular building blocks. To choose an informative but non-redundant subset of more naturally shaped regions, we hybridize our system with the result of a generic segmentation algorithm. The diameter of the segments can be of the order of the resolution of the microscope. Using multiple photon coincidence measurements of the same sample in confocal mode, we are then able to estimate the brightness and number of molecules and give uniform confidence intervals on the molecule counts for each previously constructed segment. In other words, we establish a so-called molecular map with uniform error control. The performance of the algorithm is investigated on simulated and real data.

6.
Stat Med ; 42(17): 2982-2998, 2023 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-37173778

RESUMEN

In medical studies, composite indices and/or scores are routinely used for predicting medical conditions of patients. These indices are usually developed from observed data of certain disease risk factors, and it has been demonstrated in the literature that single index models can provide a powerful tool for this purpose. In practice, the observed data of disease risk factors are often longitudinal in the sense that they are collected at multiple time points for individual patients, and there are often multiple aspects of a patient's medical condition that are of our concern. However, most existing single-index models are developed for cases with independent data and a single response variable, which are inappropriate for the problem just described in which within-subject observations are usually correlated and there are multiple mutually correlated response variables involved. This paper aims to fill this methodological gap by developing a single index model for analyzing longitudinal data with multiple responses. Both theoretical and numerical justifications show that the proposed new method provides an effective solution to the related research problem. It is also demonstrated using a dataset from the English Longitudinal Study of Aging.


Asunto(s)
Estudios Longitudinales , Humanos , Estadística como Asunto
7.
Philos Trans A Math Phys Eng Sci ; 381(2247): 20220142, 2023 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-36970827

RESUMEN

Prediction has a central role in the foundations of Bayesian statistics and is now the main focus in many areas of machine learning, in contrast to the more classical focus on inference. We discuss that, in the basic setting of random sampling-that is, in the Bayesian approach, exchangeability-uncertainty expressed by the posterior distribution and credible intervals can indeed be understood in terms of prediction. The posterior law on the unknown distribution is centred on the predictive distribution and we prove that it is marginally asymptotically Gaussian with variance depending on the predictive updates, i.e. on how the predictive rule incorporates information as new observations become available. This allows to obtain asymptotic credible intervals only based on the predictive rule (without having to specify the model and the prior law), sheds light on frequentist coverage as related to the predictive learning rule, and, we believe, opens a new perspective towards a notion of predictive efficiency that seems to call for further research. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.

8.
Ann Inst Stat Math ; 75(1): 39-70, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35645407

RESUMEN

In this work, we studied a two-component mixture model with stochastic dominance constraint, a model arising naturally from many genetic studies. To model the stochastic dominance, we proposed a semiparametric modelling of the log of density ratio. More specifically, when the log of the ratio of two component densities is in a linear regression form, the stochastic dominance is immediately satisfied. For the resulting semiparametric mixture model, we proposed two estimators, maximum empirical likelihood estimator (MELE) and minimum Hellinger distance estimator (MHDE), and investigated their asymptotic properties such as consistency and normality. In addition, to test the validity of the proposed semiparametric model, we developed Kolmogorov-Smirnov type tests based on the two estimators. The finite-sample performance, in terms of both efficiency and robustness, of the two estimators and the tests were examined and compared via both thorough Monte Carlo simulation studies and real data analysis. Supplementary Information: The online version contains supplementary material available at 10.1007/s10463-022-00835-5.

9.
Int J Biostat ; 19(1): 53-60, 2023 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-35320641

RESUMEN

Li and Greene (A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013;9:215-34) propose that estimates derived by the matching weight (MW) estimator are similar to those derived by the one-to-one propensity score matching estimator. The MW estimator has some useful properties, however, some regularity conditions need to be confirmed to derive an asymptotic distribution since the MW has a non-differentiable point. In this letter, we confirm the asymptotic distribution of the MW estimator and the sufficient conditions to achieve it.


Asunto(s)
Etnicidad , Humanos , Puntaje de Propensión , Simulación por Computador
10.
Artículo en Inglés | MEDLINE | ID: mdl-36246572

RESUMEN

The rapid finding of effective therapeutics requires efficient use of available resources in clinical trials. Covariate adjustment can yield statistical estimates with improved precision, resulting in a reduction in the number of participants required to draw futility or efficacy conclusions. We focus on time-to-event and ordinal outcomes. When more than a few baseline covariates are available, a key question for covariate adjustment in randomised studies is how to fit a model relating the outcome and the baseline covariates to maximise precision. We present a novel theoretical result establishing conditions for asymptotic normality of a variety of covariate-adjusted estimators that rely on machine learning (e.g., ℓ 1 -regularisation, Random Forests, XGBoost, and Multivariate Adaptive Regression Splines [MARS]), under the assumption that outcome data are missing completely at random. We further present a consistent estimator of the asymptotic variance. Importantly, the conditions do not require the machine learning methods to converge to the true outcome distribution conditional on baseline variables, as long as they converge to some (possibly incorrect) limit. We conducted a simulation study to evaluate the performance of the aforementioned prediction methods in COVID-19 trials. Our simulation is based on resampling longitudinal data from over 1500 patients hospitalised with COVID-19 at Weill Cornell Medicine New York Presbyterian Hospital. We found that using ℓ 1 -regularisation led to estimators and corresponding hypothesis tests that control type 1 error and are more precise than an unadjusted estimator across all sample sizes tested. We also show that when covariates are not prognostic of the outcome, ℓ 1 -regularisation remains as precise as the unadjusted estimator, even at small sample sizes ( n = 100 ). We give an R package adjrct that performs model-robust covariate adjustment for ordinal and time-to-event outcomes.

11.
J Am Stat Assoc ; 117(538): 996-1009, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36060554

RESUMEN

Characterizing the asymptotic distributions of eigenvectors for large random matrices poses important challenges yet can provide useful insights into a range of statistical applications. To this end, in this paper we introduce a general framework of asymptotic theory of eigenvectors (ATE) for large spiked random matrices with diverging spikes and heterogeneous variances, and establish the asymptotic properties of the spiked eigenvectors and eigenvalues for the scenario of the generalized Wigner matrix noise. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we establish asymptotic expansions for the general linear combination and further show that it is asymptotically normal after some normalization, where the weight vector can be arbitrary. We also provide a more general asymptotic theory for the spiked eigenvectors using the bilinear form. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these large-scale applications.

12.
Entropy (Basel) ; 24(5)2022 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-35626567

RESUMEN

Shannon's entropy is one of the building blocks of information theory and an essential aspect of Machine Learning (ML) methods (e.g., Random Forests). Yet, it is only finitely defined for distributions with fast decaying tails on a countable alphabet. The unboundedness of Shannon's entropy over the general class of all distributions on an alphabet prevents its potential utility from being fully realized. To fill the void in the foundation of information theory, Zhang (2020) proposed generalized Shannon's entropy, which is finitely defined everywhere. The plug-in estimator, adopted in almost all entropy-based ML method packages, is one of the most popular approaches to estimating Shannon's entropy. The asymptotic distribution for Shannon's entropy's plug-in estimator was well studied in the existing literature. This paper studies the asymptotic properties for the plug-in estimator of generalized Shannon's entropy on countable alphabets. The developed asymptotic properties require no assumptions on the original distribution. The proposed asymptotic properties allow for interval estimation and statistical tests with generalized Shannon's entropy.

13.
Pharm Stat ; 21(5): 1058-1073, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35191605

RESUMEN

Clinical trials usually take a period of time to recruit volunteers, and they become a steady accumulation of data. Traditionally, the sample size of a trial is determined in advance and data is collected before analysis proceeds. Over the past decades, many strategies have been proposed and rigorous theoretical groundings have been provided to conduct sample size re-estimation. However, the application of these methodologies has not been well extended to take care of trials with adaptive designs. Therefore, we aim to fill the gap by proposing a sample size re-estimation procedure on response-adaptive randomized trial. For ethical and economical concerns, we use multiple stopping criteria with the allowance of early termination. Statistical inference is studied for the hypothesis testing under doubly-adaptive biased coin design. We also prove that the test statistics for each stage are asymptotic independently normally distributed, though dependency exists between the two stages. We find that under our methods, compared to fixed sample size design and other commonly used randomization procedures: (1) power is increased for all scenarios with adjusted sample size; (2) sample size is reduced up to 40% when underestimating the treatment effect; (3) the duration of trials is shortened. These advantages are evidenced by numerical studies and real examples.


Asunto(s)
Modelos Estadísticos , Proyectos de Investigación , Interpretación Estadística de Datos , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto , Tamaño de la Muestra
14.
Biostatistics ; 23(4): 1074-1082, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-34718422

RESUMEN

There is a great need for statistical methods for analyzing skewed responses in complex sample surveys. Quantile regression is a logical option in addressing this problem but is often accompanied by incorrect variance estimation. We show how the variance can be estimated correctly by including the survey design in the variance estimation process. In a simulation study, we illustrate that the variance of the median regression estimator has a very small relative bias with appropriate coverage probability. The motivation for our work stems from the National Health and Nutrition Examination Survey where we demonstrate the impact of our results on iodine deficiency in females compared with males adjusting for other covariates.


Asunto(s)
Yodo , Sesgo , Simulación por Computador , Femenino , Humanos , Masculino , Encuestas Nutricionales , Encuestas y Cuestionarios
15.
J Appl Stat ; 48(11): 1934-1947, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35706435

RESUMEN

In high-dimensional linear regression, the dimension of variables is always greater than the sample size. In this situation, the traditional variance estimation technique based on ordinary least squares constantly exhibits a high bias even under sparsity assumption. One of the major reasons is the high spurious correlation between unobserved realized noise and several predictors. To alleviate this problem, a refitted cross-validation (RCV) method has been proposed in the literature. However, for a complicated model, the RCV exhibits a lower probability that the selected model includes the true model in case of finite samples. This phenomenon may easily result in a large bias of variance estimation. Thus, a model selection method based on the ranks of the frequency of occurrences in six votes from a blocked 3×2 cross-validation is proposed in this study. The proposed method has a considerably larger probability of including the true model in practice than the RCV method. The variance estimation obtained using the model selected by the proposed method also shows a lower bias and a smaller variance. Furthermore, theoretical analysis proves the asymptotic normality property of the proposed variance estimation.

16.
Stat Med ; 40(4): 1034-1058, 2021 02 20.
Artículo en Inglés | MEDLINE | ID: mdl-33247458

RESUMEN

This article concerns evaluating the effectiveness of a continuous diagnostic biomarker against a continuous gold standard that is measured with error. Extending the work of Obuchowski (2005, 2016), Wu et al (2016) suggested an accuracy index and proposed an estimator for the index with error-prone standard when the reliability coefficient is known. Combining with additional measurements (without measurement errors) on the continuous gold standard collected from some subjects, this article proposes two adaptive estimators of the accuracy index when the reliability coefficient is unknown, and further establish the consistency and asymptotic normality of these estimators. Simulation studies are conducted to compare various estimators. Data from an intervention trial on glycemic control among children with type 1 diabetes are used to illustrate the proposed methods.


Asunto(s)
Reproducibilidad de los Resultados , Biomarcadores , Niño , Simulación por Computador , Interpretación Estadística de Datos , Humanos
17.
J Time Ser Anal ; 41(2): 293-311, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32508370

RESUMEN

In the independent component model, the multivariate data are assumed to be a mixture of mutually independent latent components. The independent component analysis (ICA) then aims at estimating these latent components. In this article, we study an ICA method which combines the use of linear and quadratic autocorrelations to enable efficient estimation of various kinds of stationary time series. Statistical properties of the estimator are studied by finding its limiting distribution under general conditions, and the asymptotic variances are derived in the case of ARMA-GARCH model. We use the asymptotic results and a finite sample simulation study to compare different choices of a weight coefficient. As it is often of interest to identify all those components which exhibit stochastic volatility features we suggest a test statistic for this problem. We also show that a slightly modified version of the principal volatility component analysis can be seen as an ICA method. Finally, we apply the estimators in analysing a data set which consists of time series of exchange rates of seven currencies to US dollar. Supporting information including proofs of the theorems is available online.

18.
Biom J ; 62(4): 970-988, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31995248

RESUMEN

A recent method for estimating a lower bound of the population size in capture-recapture samples is studied. Specifically, some asymptotic properties, such as strong consistency and asymptotic normality, are provided. The introduced estimator is based on the empirical probability generating function (pgf) of the observed data, and it is consistent for count distributions having a log-convex pgf ( LC -class). This is a large family that includes mixed and compound Poisson distributions, and their independent sums and finite mixtures as well. The finite-sample performance of the lower bound estimator is assessed via simulation showing a better behavior than some close competitors. Several examples of application are also analyzed and discussed.


Asunto(s)
Biometría/métodos , Densidad de Población , Estadísticas no Paramétricas , Distribución de Poisson , Probabilidad
19.
J Multivar Anal ; 171: 382-396, 2019 May.
Artículo en Inglés | MEDLINE | ID: mdl-31588153

RESUMEN

By optimizing index functions against different outcomes, we propose a multivariate single-index model (SIM) for development of medical indices that simultaneously work with multiple outcomes. Fitting of a multivariate SIM is not fundamentally different from fitting a univariate SIM, as the former can be written as a sum of multiple univariate SIMs with appropriate indicator functions. What have not been carefully studied are the theoretical properties of the parameter estimators. Because of the lack of asymptotic results, no formal inference procedure has been made available for multivariate SIMs. In this paper, we examine the asymptotic properties of the multivariate SIM parameter estimators. We show that, under mild regularity conditions, estimators for the multivariate SIM parameters are indeed n-consistent and asymptotically normal. We conduct a simulation study to investigate the finite-sample performance of the corresponding estimation and inference procedures. To illustrate its use in practice, we construct an index measure of urine electrolyte markers for assessing the risk of hypertension in individual subjects.

20.
J Nonparametr Stat ; 31(4): 911-931, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-33013146

RESUMEN

Nonignorable missing-data is common in studies where the outcome is relevant to the subject's behavior. Ibrahim et al. (2001) fitted a logistic regression for a binary outcome subject to nonignorable missing data, and they proposed to replace the outcome in the mechanism model with an auxiliary variable that is completely observed. They had to correctly specify a model for the auxiliary variable; unfortunately the outcome variable subject to nonignorable missingness is still involved. The correct specification of this model is mysterious. Instead, we propose two unconventional likelihood based estimation procedures where the nonignorable missingness mechanism model could be completely bypassed. We apply our proposed methods to the children's mental health study and compare their performance with existing methods. The large sample properties of the proposed estimators are rigorously justified, and their finite sample behaviors are examined via comprehensive simulation studies.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA