Búsqueda | Portal Regional de la BVS

1.

Practical Implications of Sum Scores Being Psychometrics' Greatest Accomplishment.

McNeish, Daniel.

Psychometrika ; 2024 Jul 20.

Artículo en Inglés | MEDLINE | ID: mdl-39031300

RESUMEN

This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84-117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores-despite their mathematical justification-are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores-even if mathematically justified-are positioned to improve psychometric practice in empirical studies.

2.

Thinking About Sum Scores Yet Again, Maybe the Last Time, We Don't Know, Oh No . . .: A Comment on.

Widaman, Keith F; Revelle, William.

Educ Psychol Meas ; 84(4): 637-659, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39055096

RESUMEN

The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle-the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.

3.

Investigating the PHQ-9 With Mokken Scale Analysis and Cognitive Interviews.

Kristófersdóttir, Kristín Hulda; Kristjánsdóttir, Hafrún; Asgeirsdottir, Ragnhildur Lilja; Karlsson, Thorlakur; Vésteinsdóttir, Vaka; Thorsdottir, Fanney.

Assessment ; 31(6): 1332-1355, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-38159031

RESUMEN

Scores on the Patient Health Questionnaire-9 (PHQ-9) are frequently used to assess depression both in research and in clinical practice. The aim was to examine the validity of the PHQ-9 sum score by using Mokken scale analysis (Study I) and cognitive interviews (Study II) on the Icelandic version of PHQ-9. A primary care sample of 618 individuals was used in Study I. The results indicate that the PHQ-9 items are not close enough to perfectly unidimensional for their sum score to accurately order people on the depression severity dimension. In Study II, the sample consisted of 53 individuals, with 28 having a history of depression and 25 not. The findings reveal a number of issues concerning respondents' use of the PHQ-9. No systematic differences were found in the results of the two groups. The PHQ-9 sum score should thus be interpreted and used with great care. We provide scale revision recommendations to improve the quality of PHQ-9.

Asunto(s)

Psicometría , Humanos , Masculino , Femenino , Persona de Mediana Edad , Adulto , Anciano , Reproducibilidad de los Resultados , Islandia , Entrevista Psicológica , Cuestionario de Salud del Paciente , Depresión/diagnóstico , Depresión/psicología , Adulto Joven , Adolescente , Atención Primaria de Salud , Encuestas y Cuestionarios

4.

Premature conclusions about the signal-to-noise ratio in structural equation modeling research: A commentary on Yuan and Fang (2023).

Schuberth, Florian; Schamberger, Tamara; Rönkkö, Mikko; Liu, Yide; Henseler, Jörg.

Br J Math Stat Psychol ; 76(3): 682-694, 2023 11.

Artículo en Inglés | MEDLINE | ID: mdl-37070527

RESUMEN

In a recent article published in this journal, Yuan and Fang (British Journal of Mathematical and Statistical Psychology, 2023) suggest comparing structural equation modeling (SEM), also known as covariance-based SEM (CB-SEM), estimated by normal-distribution-based maximum likelihood (NML), to regression analysis with (weighted) composites estimated by least squares (LS) in terms of their signal-to-noise ratio (SNR). They summarize their findings in the statement that "[c]ontrary to the common belief that CB-SEM is the preferred method for the analysis of observational data, this article shows that regression analysis via weighted composites yields parameter estimates with much smaller standard errors, and thus corresponds to greater values of the [SNR]." In our commentary, we show that Yuan and Fang have made several incorrect assumptions and claims. Consequently, we recommend that empirical researchers not base their methodological choice regarding CB-SEM and regression analysis with composites on the findings of Yuan and Fang as these findings are premature and require further research.

Asunto(s)

Proyectos de Investigación , Análisis de Clases Latentes , Relación Señal-Ruido , Análisis de los Mínimos Cuadrados , Distribución Normal

5.

Bias in Gene-by-Environment Interaction Effects with Sum Scores; An Application to Well-being Phenotypes.

Pelt, Dirk H M; Schwabe, Inga; Bartels, Meike.

Behav Genet ; 53(4): 359-373, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-36856918

RESUMEN

In the current study, we investigated the influence of using skewed sum scores on estimated gene-by-environment interaction effects (GxE) for life satisfaction and happiness with perceived social support. To this end, we analyzed item-level data from a large adult twin sample (Ns between 3610 and 11,305) of the Netherlands Twin Register. Item response theory (IRT) models were incorporated in unmeasured (univariate) GxE models, and measured GxE models (with social support as moderator). We found that skewness introduced spurious GxE effects, with the largest effect for the most skewed variable (social support). Finally, in the IRT model for life satisfaction, but not for happiness, heritability estimates decreased with higher social support, while this was not observed when analyzing sum scores. Together, our results indicate that IRT can be used to address psychometric issues related to the use of sum scores, especially in the context of GxE, for complex traits like well-being.

Asunto(s)

Interacción Gen-Ambiente , Herencia Multifactorial , Fenotipo , Países Bajos

6.

Thinking thrice about sum scores, and then some more about measurement and analysis.

Widaman, Keith F; Revelle, William.

Behav Res Methods ; 55(2): 788-806, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-35469086

RESUMEN

Measurement is fundamental to all research in psychology and should be accorded greater scrutiny than typically occurs. Among other claims, McNeish and Wolf (Thinking twice about sum scores. Behavior Research Methods, 52, 2287-2305) argued that use of sum scores (a) implies that a highly constrained latent variable model underlies items comprising a scale, and (b) may misrepresent or bias relations with other criteria. The central claim by McNeish and Wolf that use of sum scores requires the assumption that a parallel test model underlies item responses is incorrect and without psychometric merit. Instead, if a set of items is unidimensional, estimators of reliability are available even if the factor model underlying the set of items does not have a highly constrained form. Thus, dimensionality of a set of items is the key issue, and whether strict constraints on parameter estimates do or do not hold dictate the appropriate way to estimate reliability. McNeish and Wolf also claimed that more precise forms of scoring, such as estimating factor scores, would be preferable to sum scores. We provide analytic bases for reliability estimation and then provide several demonstrations of reliability estimation and the relative advantages of sum scores and factor scores. We contend that several claims by McNeish and Wolf are questionable and that, as a result, multiple recommendations they made and conclusions they drew are incorrect. The upshot is that, once the dimensional structure of a set of items is verified, sum scores often have a solid psychometric basis and therefore are frequently quite adequate for psychological research.

Asunto(s)

Lobos , Animales , Reproducibilidad de los Resultados , Modelos Teóricos , Psicometría , Encuestas y Cuestionarios

7.

Psychometric properties of sum scores and factor scores differ even when their correlation is 0.98: A response to Widaman and Revelle.

McNeish, Daniel.

Behav Res Methods ; 55(8): 4269-4290, 2023 12.

Artículo en Inglés | MEDLINE | ID: mdl-36394821

RESUMEN

Commentary in Widaman and Revelle (2022) argued that sum scoring is justified as long as unidimensionality holds because sum score reliability is defined. My response begins with a review of the literature supporting the perspective we adopted in the original article. I then conduct simulation studies to assess the psychometric properties of sum scores created using Widaman and Revelle's justification relative to scores created by the weighted factor score approach in the original article. In my simulations, I generate data where sum and factor scores are correlated at 0.96 or 0.98 because high factor-sum score correlations are often used to support the contention that sum and factor scores have interchangeable psychometric properties. I explore (a) correlations between estimated scores and true scores, (b) classification accuracy of sum and factor scores, and (c) reliability of sum and factor scores. Results show that factor scores have (a) higher correlations with true scores (Δ = 0.02-0.04), (b) higher sensitivity (Δ = 4-8 percentage points), and (c) higher reliability (Δ = 0.04-0.07). Factor score performance metrics also have less sampling variability in most conditions. Psychometric properties of sum scores-even when highly correlated with factor scores-remain less desirable than those of factor scores. Additional considerations like models with multiple factors and measurement invariance are also discussed. Essentially, even if accepting Widaman and Revelle's justification for sum scoring, it is uncertain whether researchers generally would want to sum score after fitting a factor analysis unless sum and factor scores correlate at (and not merely close to) 1.00.

Asunto(s)

Psicometría , Humanos , Reproducibilidad de los Resultados , Análisis Factorial , Encuestas y Cuestionarios

8.

Correlation Between Neutrophil-to-Lymphocyte Ratio and Motoric Deterioration in Patients With Guillain-Barre Syndrome.

Sutantoyo, Felisitas Farica; Basuki, Mudjiani; Hamdan, Muhammad.

J Clin Neurol ; 18(6): 671-680, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-36367065

RESUMEN

BACKGROUND AND PURPOSE: Guillain-Barre syndrome (GBS) is a common cause of inflammation-related acute flaccid paralysis, and is characterized by acute onset, rapid progression, and symmetrical weakness. GBS is an emergency with high morbidity and long-term disability rates. It is important to determine the prognostic factors for GBS in order to improve the disease outcomes. This study aimed to identify the correlation between the neutrophil-to-lymphocyte ratio (NLR) on day 1 of hospitalization (D1) and motor deterioration in GBS patients. METHODS: This observational analytical study applied a cross-sectional analysis to the medical records of GBS patients who were hospitalized at Dr. Soetomo General Hospital Surabaya from January 2018 to March 2020. The analysis used the chi-square bivariate test, multivariate analysis with logistic regression, and correlation analysis with the Spearman test. RESULTS: The study included 61 subjects. Statistical tests showed that there was no correlation between NLR and changes in the Medical Research Council sum scores (ΔMRC sum scores) during D1-D3, D1-D7, D1-D14, and D1 to the day of discharge (p>0.05). There was a significant correlation between NLR and the Erasmus GBS outcome score (EGOS) (p=0.006). NLR values differed significantly within each treatment group (p=0.001). Therefore, a subanalysis within each treatment group was conducted, which revealed a significant negative correlation (p<0.05) between NLR and the ΔMRC sum score during D1-D14 in the group treated without immunotherapy. CONCLUSIONS: There was no correlation between NLR and motor deterioration in patients with GBS during hospitalization. However, NLR was significantly correlated with EGOS, and there was a negative correlation between NLR and motor deterioration during D1-D14 in GBS patients treated without immunotherapy.

9.

On Dimensionality, Measurement Invariance, and Suitability of Sum Scores for the PHQ-9 and the GAD-7.

Stochl, Jan; Fried, Eiko I; Fritz, Jessica; Croudace, Tim J; Russo, Debra A; Knight, Clare; Jones, Peter B; Perez, Jesus.

Assessment ; 29(3): 355-366, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-33269612

RESUMEN

In psychiatry, severity of mental health conditions and their change over time are usually measured via sum scores of items on psychometric scales. However, inferences from such scores can be biased if psychometric properties such as unidimensionality and temporal measurement invariance for instruments are not met. Here, we aimed to evaluate these properties for common measures of depression (Patient Health Questionnaire-9) and anxiety (Generalized Anxiety Disorder Assessment-7) in a large clinical sample (N = 22,362) undergoing psychotherapy. In addition, we tested consistency in dimensionality results across different methods (parallel analysis, factor analysis, explained common variance, the partial credit model, and the Mokken model). Results showed that while both Patient Health Questionnaire-9 and Generalized Anxiety Disorder Assessment-7 are multidimensional instruments with highly correlated factors, there is justification for sum scores as measures of severity. Temporal measurement invariance across 10 therapy sessions was evaluated. Strict temporal measurement invariance was established in both scales, allowing researchers to compare sum scores as severity measures across time.

Asunto(s)

Depresión , Cuestionario de Salud del Paciente , Ansiedad/psicología , Trastornos de Ansiedad/diagnóstico , Trastornos de Ansiedad/psicología , Depresión/diagnóstico , Depresión/psicología , Humanos , Psicometría , Reproducibilidad de los Resultados

10.

Bayesian and Maximum-Likelihood Modeling and Higher-Level Scores of Interpersonal Problems With Circumplex Structure.

Weide, Anneke C; Scheuble, Vera; Beauducel, André.

Front Psychol ; 12: 761378, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34777165

RESUMEN

Difficulties in interpersonal behavior are often measured by the circumplex-based Inventory of Interpersonal Problems. Its eight scales can be represented by a three-factor structure with two circumplex factors, Dominance and Love, and a general problem factor, Distress. Bayesian confirmatory factor analysis is well-suited to evaluate the higher-level structure of interpersonal problems because circumplex loading priors allow for data-driven adjustments and a more flexible investigation of the ideal circumplex pattern than conventional maximum likelihood confirmatory factor analysis. Using a non-clinical sample from an online questionnaire study (N = 822), we replicated the three-factor structure of the IIP by maximum likelihood and Bayesian confirmatory factor analysis and found great proximity of the Bayesian loadings to perfect circumplexity. We found additional support for the validity of the three-factor model of the IIP by including external criteria-Agreeableness, Extraversion, and Neuroticism from the Big Five and subclinical grandiose narcissism-in the analysis. We also investigated higher-level scores for Dominance, Love, and Distress using traditional regression factor scores and weighted sum scores. We found excellent reliability (with R tt ≥ 0.90) for Dominance, Love, and Distress for the two scoring methods. We found high congruence of the higher-level scores with the underlying factors and good circumplex properties of the scoring models. The correlational pattern with the external measures was in line with theoretical expectations and similar to the results from the factor analysis. We encourage the use of Bayesian modeling when dealing with circumplex structure and recommend the use of higher-level scores for interpersonal problems as parsimonious, reliable, and valid measures.

11.

The Appropriateness of Sum Scores as Estimates of Factor Scores in the Multiple Factor Analysis of Ordered-Categorical Responses.

Ferrando, Pere J; Lorenzo-Seva, Urbano.

Educ Psychol Meas ; 81(2): 205-228, 2021 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37929264

RESUMEN

Unit-weight sum scores (UWSSs) are routinely used as estimates of factor scores on the basis of solutions obtained with the nonlinear exploratory factor analysis (EFA) model for ordered-categorical responses. Theoretically, this practice results in a loss of information and accuracy, and is expected to lead to biased estimates. However, the practical relevance of these limitations is far from clear. In this article, we adopt an empirical view and propose indices and procedures (some of them new) for assessing the appropriateness of UWSSs in nonlinear EFA applications. A new automated approach for obtaining UWSSs that maximize fidelity and correlational accuracy is proposed. The appropriateness of UWSSs under different conditions and the behavior of the present proposal in comparison with other more common approaches are assessed with a simulation study. A tutorial for interested practitioners is presented using an illustrative example based on a well-known personality questionnaire. All the procedures proposed in the article have been implemented in a well-known noncommercial EFA program.

12.

Psychometric Modelling of Longitudinal Genetically Informative Twin Data.

Schwabe, Inga; Gu, Zhengguo; Tijmstra, Jesper; Hatemi, Pete; Pohl, Steffi.

Front Genet ; 10: 837, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31681400

RESUMEN

The often-used A(C)E model that decomposes phenotypic variance into parts due to additive genetic and environmental influences can be extended to a longitudinal model when the trait has been assessed at multiple occasions. This enables inference about the nature (e.g., genetic or environmental) of the covariance among the different measurement points. In the case that the measurement of the phenotype relies on self-report data (e.g., questionnaire data), often, aggregated scores (e.g., sum-scores) are used as a proxy for the phenotype. However, earlier research based on the univariate ACE model that concerns a single measurement occasion has shown that this can lead to an underestimation of heritability and that instead, one should prefer to model the raw item data by integrating an explicit measurement model into the analysis. This has, however, not been translated to the more complex longitudinal case. In this paper, we first present a latent state twin A(C)E model that combines the genetic twin model with an item response theory (IRT) model as well as its specification in a Bayesian framework. Two simulation studies were conducted to investigate 1) how large the bias is when sum-scores are used in the longitudinal A(C)E model and 2) if using the latent twin model can overcome the potential bias. Results of the first simulation study (e.g., AE model) demonstrated that using a sum-score approach leads to underestimated heritability estimates and biased covariance estimates. Surprisingly, the IRT approach also lead to bias, but to a much lesser degree. The amount of bias increased in the second simulation study (e.g., ACE model) under both frameworks, with the IRT approach still being the less biased approach. Since the bias was less severe under the IRT approach than under the sum-score approach and due to other advantages of latent variable modelling, we still advise researcher to adopt the IRT approach. We further illustrate differences between the traditional sum-score approach and the latent state twin A(C)E model by analyzing data of a two-wave twin study, consisting of the answers of 8,016 twins on a scale developed to measure social attitudes related to conservatism.

13.

Sum Scores in Twin Growth Curve Models: Practicality Versus Bias.

Luningham, Justin M; McArtor, Daniel B; Bartels, Meike; Boomsma, Dorret I; Lubke, Gitta H.

Behav Genet ; 47(5): 516-536, 2017 09.

Artículo en Inglés | MEDLINE | ID: mdl-28780665

RESUMEN

To study behavioral or psychiatric phenotypes, multiple indices of the behavior or disorder are often collected that are thought to best reflect the phenotype. Combining these items into a single score (e.g. a sum score) is a simple and practical approach for modeling such data, but this simplicity can come at a cost in longitudinal studies, where the relevance of individual items often changes as a function of age. Such changes violate the assumptions of longitudinal measurement invariance (MI), and this violation has the potential to obfuscate the interpretation of the results of latent growth models fit to sum scores. The objectives of this study are (1) to investigate the extent to which violations of longitudinal MI lead to bias in parameter estimates of the average growth curve trajectory, and (2) whether absence of MI affects estimates of the heritability of these growth curve parameters. To this end, we analytically derive the bias in the estimated means and variances of the latent growth factors fit to sum scores when the assumption of longitudinal MI is violated. This bias is further quantified via Monte Carlo simulation, and is illustrated in an empirical analysis of aggression in children aged 3-12 years. These analyses show that measurement non-invariance across age can indeed bias growth curve mean and variance estimates, and our quantification of this bias permits researchers to weigh the costs of using a simple sum score in longitudinal studies. Simulation results indicate that the genetic variance decomposition of growth factors is, however, not biased due to measurement non-invariance across age, provided the phenotype is measurement invariant across birth-order and zygosity in twins.

Asunto(s)

Modelos Estadísticos , Estudios en Gemelos como Asunto/métodos , Adolescente , Agresión/psicología , Niño , Preescolar , Femenino , Humanos , Estudios Longitudinales , Masculino , Modelos Genéticos , Método de Montecarlo , Gemelos/genética

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA