RESUMEN
The low replicability of scientific studies has become an important issue. One possible cause is low representativeness of the experimental design employed. Already in the 1950's, Egon Brunswick pointed out that experimental setups ideally should be based on a random sample of stimuli from the subjects' natural environment or at least include basic features of that environment. Only experimental designs satisfying this criterion, representative designs in Brunswikian terminology, can produce results generalizable beyond the procedure used and to situations outside the laboratory. Such external validity is crucial in preclinical drug studies, for example, and should be important for replicability in general. Popular experimental setups in rodent research on non-human animals, like the tail suspension test or the Geller-Seifter procedure, do not correspond to contexts likely to be encountered in the animals' habitat. Consequently, results obtained in this kind of procedures can be generalized neither to other procedures nor to contexts outside the laboratory. Furthermore, many traditional procedures are incompatible with current notions of animal welfare. An approximation to the natural social and physical context can be provided in the laboratory, in the form of a seminatural environment. In addition to satisfy the basic demands for a representative design, such environments offer a far higher level of animal welfare than the typical small cages. This perspective article will briefly discuss the basic principles of the generalizability of experimental results, the virtues of representative designs and the coincidence of enhanced scientific quality and animal welfare provided by this kind of design.
RESUMEN
OBJECTIVE: The objective of this study was to evaluate whether infants randomized in the Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network Necrotizing Enterocolitis Surgery Trial differed from eligible infants and whether differences affected the generalizability of trial results. STUDY DESIGN: Secondary analysis of infants enrolled in Necrotizing Enterocolitis Surgery Trial (born 2010-2017, with follow-up through 2019) at 20 US academic medical centers and an observational data set of eligible infants through 2013. Infants born ≤1000 g and diagnosed with necrotizing enterocolitis or spontaneous intestinal perforation requiring surgical intervention at ≤8 weeks were eligible. The target population included trial-eligible infants (randomized and nonrandomized) born during the first half of the study with available detailed preoperative data. Using model-based weighting methods, we estimated the effect of initial laparotomy vs peritoneal drain had the target population been randomized. RESULTS: The trial included 308 randomized infants. The target population included 382 (156 randomized and 226 eligible, non-randomized) infants. Compared with the target population, fewer randomized infants had necrotizing enterocolitis (31% vs 47%) or died before discharge (27% vs 41%). However, incidence of the primary composite outcome, death or neurodevelopmental impairment, was similar (69% vs 72%). Effect estimates for initial laparotomy vs drain weighted to the target population were largely unchanged from the original trial after accounting for preoperative diagnosis of necrotizing enterocolitis (adjusted relative risk [95% CI]: 0.85 [0.71-1.03] in target population vs 0.81 [0.64-1.04] in trial) or spontaneous intestinal perforation (1.02 [0.79-1.30] vs 1.11 [0.95-1.31]). CONCLUSION: Despite differences between randomized and eligible infants, estimated treatment effects in the trial and target population were similar, supporting the generalizability of trial results. TRIAL REGISTRATION: ClinicalTrials.gov ID: NCT01029353.
Asunto(s)
Enterocolitis Necrotizante , Enfermedades del Recién Nacido , Enfermedades del Prematuro , Perforación Intestinal , Niño , Recién Nacido , Lactante , Humanos , Perforación Intestinal/cirugía , Enterocolitis Necrotizante/epidemiología , Enterocolitis Necrotizante/cirugía , Enterocolitis Necrotizante/complicaciones , Laparotomía/efectos adversos , Enfermedades del Prematuro/cirugíaRESUMEN
The rapid embracing of artificial intelligence in psychiatry has a flavor of being the current "wild west"; a multidisciplinary approach that is very technical and complex, yet seems to produce findings that resonate. These studies are hard to review as the methods are often opaque and it is tricky to find the suitable combination of reviewers. This issue will only get more complex in the absence of a rigorous framework to evaluate such studies and thus nurture trustworthiness. Therefore, our paper discusses the urgency of the field to develop a framework with which to evaluate the complex methodology such that the process is done honestly, fairly, scientifically, and accurately. However, evaluation is a complicated process and so we focus on three issues, namely explainability, transparency, and generalizability, that are critical for establishing the viability of using artificial intelligence in psychiatry. We discuss how defining these three issues helps towards building a framework to ensure trustworthiness, but show how difficult definition can be, as the terms have different meanings in medicine, computer science, and law. We conclude that it is important to start the discussion such that there can be a call for policy on this and that the community takes extra care when reviewing clinical applications of such models..
Asunto(s)
Aprendizaje Automático , Modelos Teóricos , Psiquiatría/métodos , Humanos , Psiquiatría/normasRESUMEN
BACKGROUND: The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. METHODS: An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. RESULTS: The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. CONCLUSIONS: Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Asunto(s)
Competencia Clínica/normas , Educación Médica/métodos , Evaluación Educacional/métodos , Modelos Teóricos , Estudios Transversales , Humanos , México , Reproducibilidad de los ResultadosRESUMEN
El propósito de este trabajo fue optimizar una nueva herramienta de evaluación de la calidad de servicio percibida en los Centros de Atención Temprana, desarrollada ad hoc, que permitiera analizar distintas dimensiones desde la perspectiva de la Teoría de la Generalizabilidad. El objetivo fue desglosar la variabilidad real de la variabilidad del error. Se estimaron los componentes de varianza de las facetas centros, usuarios, ítems y escalas. Se utilizó una estructura de diseño multifaceta parcialmente anidado, analizando distintas categorías independientemente y en interacción, obteniendo excelentes resultados tanto de los índices de fiabilidad como de generalizabilidad. Se realizó una optimización del diseño de medida para optimizar los tamaños muestrales, pudiendo ser considerado un análisis coste-beneficio.
The purpose of this work was to optimize a new tool for evaluation of quality perceived in the Early Intervention Centers, developed ad hoc, allowing to analyze different dimensions of quality service in these kind of centres from the perspective of the Generalizability Theory. The aim was to break down the actual variability of the variability of the error. The variance of the facets centers, users, items and scales components were estimated. We used a structure of multifaceted partially nested design, analyzing different categories independently and in interaction, obtaining excellent results both indexes of reliability and generalizability. Measure design optimization to optimize sample sizes, and can be considered a cost-benefit analysis was conducted.