Búsqueda | Portal Regional de la BVS

1.

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Olsen, Robert B; Orr, Larry L; Bell, Stephen H; Petraglia, Elizabeth; Badillo-Goicoechea, Elena; Miyaoka, Atsushi; Stuart, Elizabeth A.

J Res Educ Eff ; 17(1): 184-210, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38450254

RESUMEN

Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods-lasso regression and Bayesian Additive Regression Trees (BART)-using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced "less inaccurate" predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modelling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.

2.

Some Lessons From 50 Years of Multiarm Public Policy Experiments.

Orr, Larry L; Gubits, Daniel.

Eval Rev ; 47(1): 43-70, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-33302732

RESUMEN

In this article, we explore the reasons why multiarm trials have been conducted and the design and analysis issues they involve. We point to three fundamental reasons for such designs: (1) Multiarm designs allow the estimation of "response surfaces"-that is, the variation in response to an intervention across a range of one or more continuous policy parameters. (2) Multiarm designs are an efficient way to test multiple policy approaches to the same social problem simultaneously, either to compare the effects of the different approaches or to estimate the effect of each separately. (3) Multiarm designs may allow for the estimation of the separate and combined effects of discrete program components. We illustrate each of these objectives with examples from the history of public policy experimentation over the past 50 years and discuss some design and analysis issues raised by each, including sample allocation, statistical power, multiple comparisons, and alignment of analysis with goals of the evaluation.

Asunto(s)

Política Pública , Proyectos de Investigación

3.

Characteristics of School Districts That Participate in Rigorous National Educational Evaluations.

Stuart, Elizabeth A; Bell, Stephen H; Ebnesajjad, Cyrus; Olsen, Robert B; Orr, Larry L.

J Res Educ Eff ; 10(1): 168-206, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29276552

RESUMEN

Given increasing interest in evidence-based policy, there is growing attention to how well the results from rigorous program evaluations may inform policy decisions. However, little attention has been paid to documenting the characteristics of schools or districts that participate in rigorous educational evaluations, and how they compare to potential target populations for the interventions that were evaluated. Utilizing a list of the actual districts that participated in 11 large-scale rigorous educational evaluations, we compare those districts to several different target populations of districts that could potentially be affected by policy decisions regarding the interventions under study. We find that school districts that participated in the 11 rigorous educational evaluations differ from the interventions' target populations in several ways, including size, student performance on state assessments, and location (urban/rural). These findings raise questions about whether, as currently implemented, the results from rigorous impact studies in education are likely to generalize to the larger set of school districts-and thus schools and students-of potential interest to policymakers, and how we can improve our study designs to retain strong internal validity while also enhancing external validity.

4.

2014 Rossi Award lecture:* beyond internal validity.

Orr, Larry L.

Eval Rev ; 39(2): 167-78, 2015 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-25805301

RESUMEN

BACKGROUND: For much of the last 40 years, the evaluation profession has been consumed in a battle over internal validity. Today, that battle has been decided. Random assignment, while still far from universal in practice, is almost universally acknowledged as the preferred method for impact evaluation. It is time for the profession to shift its attention to the remaining major flaws in the "standard model" of evaluation: (i) external validity and (ii) the high cost and low hit rate of experimental evaluations as currently practiced. RECOMMENDATIONS: To raise the profession's attention to external validity, the author recommends some simple, easy steps to be taken in every evaluation. The author makes two recommendations to increase the number of interventions found to be effective within existing resources: First, a two-stage evaluation strategy in which a cheap, streamlined Stage 1 evaluation is followed by a more intensive Stage 2 evaluation only for those interventions found to be effective in a Stage 1 trial and, second, use of random assignment to guide the myriad program management decisions that must be made in the course of routine program operations. This article is not intended as a solution to these issues: It is intended to stimulate the evaluation community to take these issues more seriously and to develop innovative solutions.

Asunto(s)

Distinciones y Premios , Estudios de Evaluación como Asunto , Humanos

5.

External Validity in Policy Evaluations that Choose Sites Purposively.

Olsen, Robert B; Orr, Larry L; Bell, Stephen H; Stuart, Elizabeth A.

J Policy Anal Manage ; 32(1): 107-121, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-25152557

RESUMEN

Evaluations of the impact of social programs are often carried out in multiple "sites," such as school districts, housing authorities, local TANF offices, or One-Stop Career Centers. Most evaluations select sites purposively following a process that is nonrandom. Unfortunately, purposive site selection can produce a sample of sites that is not representative of the population of interest for the program. In this paper, we propose a conceptual model of purposive site selection. We begin with the proposition that a purposive sample of sites can usefully be conceptualized as a random sample of sites from some well-defined population, for which the sampling probabilities are unknown and vary across sites. This proposition allows us to derive a formal, yet intuitive, mathematical expression for the bias in the pooled impact estimate when sites are selected purposively. This formula helps us to better understand the consequences of selecting sites purposively, and the factors that contribute to the bias. Additional research is needed to obtain evidence on how large the bias tends to be in actual studies that select sites purposively, and to develop methods to increase the external validity of these studies.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA