Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
ArXiv ; 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38463500

RESUMEN

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.

2.
Genes (Basel) ; 14(9)2023 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-37761917

RESUMEN

Microbiome data are subject to experimental bias that is caused by DNA extraction and PCR amplification, among other sources, but this important feature is often ignored when developing statistical methods for analyzing microbiome data. McLaren, Willis, and Callahan (2019) proposed a model for how such biases affect the observed taxonomic profiles; this model assumes the main effects of bias without taxon-taxon interactions. Our newly developed method for testing the differential abundance of taxa, LOCOM, is the first method to account for experimental bias and is robust to the main effect biases. However, there is also evidence for taxon-taxon interactions. In this report, we formulated a model for interaction biases and used simulations based on this model to evaluate the impact of interaction biases on the performance of LOCOM as well as other available compositional analysis methods. Our simulation results indicate that LOCOM remained robust to a reasonable range of interaction biases. The other methods tend to have an inflated FDR even when there were only main effect biases. LOCOM maintained the highest sensitivity even when the other methods could not control the FDR. We thus conclude that LOCOM outperforms the other methods for compositional analysis of microbiome data considered here.


Asunto(s)
Microbiota , Sesgo , Simulación por Computador , Microbiota/genética , Reacción en Cadena de la Polimerasa
3.
Proteomics ; 23(18): e2200406, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37357151

RESUMEN

In discovery proteomics, as well as many other "omic" approaches, the possibility to test for the differential abundance of hundreds (or of thousands) of features simultaneously is appealing, despite requiring specific statistical safeguards, among which controlling for the false discovery rate (FDR) has become standard. Moreover, when more than two biological conditions or group treatments are considered, it has become customary to rely on the one-way analysis of variance (ANOVA) framework, where a first global differential abundance landscape provided by an omnibus test can be subsequently refined using various post-hoc tests (PHTs). However, the interactions between the FDR control procedures and the PHTs are complex, because both correspond to different types of multiple test corrections (MTCs). This article surveys various ways to orchestrate them in a data processing workflow and discusses their pros and cons.


Asunto(s)
Proteómica , Proteómica/métodos , Análisis de Varianza
4.
J Affect Disord ; 299: 273-280, 2022 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-34906640

RESUMEN

BACKGROUND: Childhood irritability, characterized by low frustration tolerance and developmentally-inappropriate temper outbursts, is a transdiagnostic symptom in child psychiatry. Little is known regarding the influences of early experience and environmental exposure on irritability from a perinatal perspective. This study examined the associations between irritability and multiple perinatal and birth factors. METHODS: Drawn Taiwan's National Epidemiological Study of Child Mental Disorders, 5124 children (2591 females) aged 7.7 to 14.6 years (mean 11.2 years) and their parents completed the Affective Reactivity Index, a well-established irritability measure. Parents completed a survey on parental, perinatal, and birth characteristics. Multiple linear regression models were performed to examine the associations between perinatal and birth characteristics and child irritability reported across informants. RESULTS: Maternal smoking, vaginal bleeding, and pre-eclampsia during pregnancy and phototherapy for jaundice >3 days were associated with high irritability after adjusting for child's age, sex, and parental characteristics. Findings were consistent across parent- and child-rated irritability. LIMITATIONS: Retrospective assessment of early exposures may be subject to recall bias despite previously-established validity and reliability. Longitudinal research with prospective assessments of early life exposures is recommended to confirm our findings. This exploratory approach of multiple survey items also precludes more in-depth assessments of perinatal risks for developing irritability. CONCLUSIONS: This study provides novel evidence suggesting a perinatal link with irritability in a national sample of youths. Given that irritability predicts adverse mental health and life outcomes, identifying its perinatal and birth predictors may inform early etiology, guiding timely assessment and intervention.


Asunto(s)
Genio Irritable , Trastornos del Humor , Adolescente , Femenino , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos , Taiwán/epidemiología
5.
Artículo en Inglés | MEDLINE | ID: mdl-34501892

RESUMEN

Multiplicity arises when data analysis involves multiple simultaneous inferences, increasing the chance of spurious findings. It is a widespread problem frequently ignored by researchers. In this paper, we perform an exploratory analysis of the Web of Science database for COVID-19 observational studies. We examined 100 top-cited COVID-19 peer-reviewed articles based on p-values, including up to 7100 simultaneous tests, with 50% including >34 tests, and 20% > 100 tests. We found that the larger the number of tests performed, the larger the number of significant results (r = 0.87, p < 10-6). The number of p-values in the abstracts was not related to the number of p-values in the papers. However, the highly significant results (p < 0.001) in the abstracts were strongly correlated (r = 0.61, p < 10-6) with the number of p < 0.001 significances in the papers. Furthermore, the abstracts included a higher proportion of significant results (0.91 vs. 0.50), and 80% reported only significant results. Only one reviewed paper addressed multiplicity-induced type I error inflation, pointing to potentially spurious results bypassing the peer-review process. We conclude the need to pay special attention to the increased chance of false discoveries in observational studies, including non-replicated striking discoveries with a potentially large social impact. We propose some easy-to-implement measures to assess and limit the effects of multiplicity.


Asunto(s)
COVID-19 , Humanos , Revisión por Pares , Probabilidad , SARS-CoV-2
6.
Environ Res ; 201: 111600, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34214558

RESUMEN

We analyse the paper "The spread of SARS-CoV-2 in Spain: Hygiene habits, sociodemographic profile, mobility patterns and comorbidities" authored by Rodríguez-Barranco et al. (2021), published in Environmental Research, vol.192, January 2021. The study was carried out under challenging conditions and provides original data of great value for exploratory purposes. Nevertheless, we found that the authors have not considered the potential effect of the multiple hypothesis testing carried out until obtaining the final model on the increased occurrence of false discoveries by mere chance. After adjusting the results provided in the paper for the effects of multiple testing, we conclude that only one of the five factors cited as statistically significant and relevant in the article, living with someone who has suffered from COVID-19, remained significantly related to the relative prevalence of COVID-19. Therefore, the preeminent role given in the analysed work to walking the dog as one of the main transmission routes of COVID-19 probably does not correspond to an actual effect. Instead, until replicated by other studies, it should be considered a spurious discovery.


Asunto(s)
COVID-19 , Animales , Perros , Humanos , SARS-CoV-2 , España , Caminata
7.
Mol Genet Metab Rep ; 25: 100642, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32939338

RESUMEN

In de novo purine biosynthesis (DNPS), 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase (EC 2.1.2.3)/inosine monophosphate cyclohydrolase (EC 3.5.4.10) (ATIC) catalyzes the last two reactions of the pathway: conversion of 5-aminoimidazole-4-carboxamide ribonucleotide [aka Z-nucleotide monophosphate (ZMP)] to 5-formamido-4-imidazolecarboxamide ribonucleotide (FAICAR) then to inosine monophosphate (IMP). Mutations in ATIC cause an untreatable and devastating inborn error of metabolism in humans. ZMP is an adenosine monophosphate (AMP) mimetic and a known activator of AMP-activated protein kinase (AMPK). Recently, a HeLa cell line null mutant for ATIC was constructed via CRISPR-Cas9 mutagenesis. This mutant, crATIC, accumulates ZMP during purine starvation. Given that the mutant can accumulate ZMP in the absence of treatment with exogenous compounds, crATIC is likely an important cellular model of DNPS inactivation and ZMP accumulation. In the current study, we characterize the crATIC transcriptome versus the HeLa transcriptome in purine-supplemented and purine-depleted growth conditions. We report and discuss transcriptome changes with particular relevance to Alzheimer's disease and in genes relevant to lipid and fatty acid synthesis, neurodevelopment, embryogenesis, cell cycle maintenance and progression, extracellular matrix, immune function, TGFß and other cellular processes.

8.
Proc Natl Acad Sci U S A ; 117(39): 24117-24126, 2020 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-32948695

RESUMEN

We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.


Asunto(s)
Estudios de Asociación Genética , Técnicas Genéticas , Variación Genética , Herencia , Fenotipo , Humanos
9.
Ann Appl Stat ; 13(1): 1-33, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31687060

RESUMEN

We tackle the problem of selecting from among a large number of variables those that are "important" for an outcome. We consider situations where groups of variables are also of interest. For example, each variable might be a genetic polymorphism, and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorphisms. In this context, to discover that a variable is relevant for the outcome implies discovering that the larger entity it represents is also important. To guarantee meaningful results with high chance of replicability, we suggest controlling the rate of false discoveries for findings at the level of individual variables and at the level of groups. Building on the knockoff construction of Barber and Candès [Ann. Statist. 43 (2015) 2055-2085] and the multilayer testing framework of Barber and Ramdas [J. Roy. Statist. Soc. Ser. B 79 (2017) 1247-1268], we introduce the multilayer knockoff filter (MKF). We prove that MKF simultaneously controls the FDR at each resolution and use simulations to show that it incurs little power loss compared to methods that provide guarantees only for the discoveries of individual variables. We apply MKF to analyze a genetic dataset and find that it successfully reduces the number of false gene discoveries without a significant reduction in power.

10.
Mol Genet Metab Rep ; 21: 100512, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31516833

RESUMEN

Adenylosuccinate lyase (ADSL) catalyzes two steps in de novo purine synthesis (DNPS). Mutations in ADSL can result in inborn errors of metabolism characterized by developmental delay and disorder phenotypes, with no effective treatment options. Recently, SAICAR, a metabolic substrate of ADSL, has been found to have alternative roles in the cell, complicating the role of ADSL. crADSL, a CRISPR KO of ADSL in HeLa cells, was constructed to investigate DNPS and ADSL in a human cell line. Here we employ this cell line in an RNA-seq analysis to initially investigate the effect of DNPS and ADSL deficiency on the transcriptome as a first step in establishing a cellular model of ADSL deficiency. We report transcriptome changes in genes relevant to development, vascular development, muscle, and cancer biology, which provide interesting avenues for future research.

11.
BMC Bioinformatics ; 19(1): 323, 2018 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-30217148

RESUMEN

BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results. RESULTS: We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics. CONCLUSIONS: The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting.


Asunto(s)
Modelos Estadísticos , Teorema de Bayes , Reparación del ADN , Bases de Datos Factuales , Genoma Humano , Humanos
12.
J Proteome Res ; 17(1): 359-373, 2018 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-29057651

RESUMEN

The study of post-translational methylation is hampered by the fact that large-scale LC-MS/MS experiments produce high methylpeptide false discovery rates (FDRs). The use of heavy-methyl stable isotope labeling by amino acids in cell culture (heavy-methyl SILAC) can drastically reduce these FDRs; however, this approach is limited by a lack of heavy-methyl SILAC compatible software. To fill this gap, we recently developed MethylQuant. Here, using an updated version of MethylQuant, we demonstrate its methylpeptide validation and quantification capabilities and provide guidelines for its best use. Using reference heavy-methyl SILAC data sets, we show that MethylQuant predicts with statistical significance the true or false positive status of methylpeptides in samples of varying complexity, degree of methylpeptide enrichment, and heavy to light mixing ratios. We introduce methylpeptide confidence indicators, MethylQuant Confidence and MethylQuant Score, and demonstrate their strong performance in complex samples characterized by a lack of methylpeptide enrichment. For these challenging data sets, MethylQuant identifies 882 of 1165 true positive methylpeptide spectrum matches (i.e., >75% sensitivity) at high specificity (<2% FDR) and achieves near-perfect specificity at 41% sensitivity. We also demonstrate that MethylQuant produces high accuracy relative quantification data that are tolerant of interference from coeluting peptide ions. Together MethylQuant's capabilities provide a path toward routine, accurate characterizations of the methylproteome using heavy-methyl SILAC.


Asunto(s)
Metilación , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Sitios de Unión , Marcaje Isotópico , Sensibilidad y Especificidad
13.
Front Microbiol ; 8: 2114, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29163406

RESUMEN

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

14.
BMC Res Notes ; 10(1): 143, 2017 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-28376847

RESUMEN

BACKGROUND: Variable selection is frequently carried out during the analysis of many types of high-dimensional data, including those in metabolomics. This study compared the predictive performance of four variable selection methods using stability-based selection, a new secondary selection method that is implemented in the R package BioMark. Two of these methods were evaluated using the more well-known false discovery rate (FDR) as well. RESULTS: Simulation studies varied factors relevant to biological data studies, with results based on the median values of 200 partial area under the receiver operating characteristic curve. There was no single top performing method across all factor settings, but the student t test based on stability selection or with FDR adjustment and the variable importance in projection (VIP) scores from partial least squares regression models obtained using a stability-based approach tended to perform well in most settings. Similar results were found with a real spiked-in metabolomics dataset. Group sample size, group effect size, number of significant variables and correlation structure were the most important factors whereas the percentage of significant variables was the least important. CONCLUSIONS: Researchers can improve prediction scores for their study data by choosing VIP scores based on stability variable selection over the other approaches when the number of variables is small to modest and by increasing the number of samples even moderately. When the number of variables is high and there is block correlation amongst the significant variables (i.e., true biomarkers), the FDR-adjusted student t test performed best. The R package BioMark is an easy-to-use open-source program for variable selection that had excellent performance characteristics for the purposes of this study.


Asunto(s)
Biomarcadores/análisis , Simulación por Computador , Metabolómica/estadística & datos numéricos , Estadística como Asunto/métodos , Animales , Humanos , Análisis de los Mínimos Cuadrados , Modelos Teóricos , Análisis Multivariante , Curva ROC , Reproducibilidad de los Resultados
15.
J Orthop Res ; 35(10): 2203-2210, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28169450

RESUMEN

Dual energy X-ray absorptiometry (DXA) is the reference standard method used to study bone mineral density (BMD) after total hip arthroplasty (THA). However, the subtle, spatially complex changes in bone mass due to strain-adaptive bone remodeling relevant to different prosthesis designs are not readily resolved using conventional DXA analysis. DXA region free analysis (DXA RFA) is a novel computational image analysis technique that provides a high-resolution quantitation of periprosthetic BMD. Here, we applied the technique to quantitate the magnitude and areal size of periprosthetic BMD changes using scans acquired during two previous randomized clinical trials (2004 to 2009); one comparing three cemented prosthesis design geometries, and the other comparing a hip resurfacing versus a conventional cementless prosthesis. DXA RFA resolved subtle differences in magnitude and area of bone remodeling between prosthesis designs not previously identified in conventional DXA analyses. A mean bone loss of 10.3%, 12.1%, and 11.1% occurred for the three cemented prostheses within a bone area fraction of 14.8%, 14.4%, and 6.2%, mostly within the lesser trochanter (p < 0.001). For the cementless prosthesis, a diffuse pattern of bone loss (-14.3%) was observed at the shaft of femur in a small area fraction of 0.6% versus no significant bone loss for the hip resurfacing prosthesis (p < 0.001). BMD increases were observed consistently at the greater trochanter for all prostheses except the hip-resurfacing prosthesis, where BMD increase was widespread across the metaphysis (p < 0.001). DXA RFA provides high-resolution insights into the effect of prosthesis design on the local strain environment in bone. © 2017 The Authors Journal of Orthopaedic Research published by Wiley Periodicals, Inc. on behalf of Orthopaedic Research Society. J Orthop Res 35:2203-2210, 2017.


Asunto(s)
Remodelación Ósea , Densitometría/métodos , Fémur/fisiología , Prótesis de Cadera , Diseño de Prótesis , Adulto , Anciano , Artroplastia de Reemplazo de Cadera/instrumentación , Densidad Ósea , Femenino , Humanos , Masculino , Persona de Mediana Edad
16.
J Proteome Res ; 14(10): 4099-103, 2015 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-26257019

RESUMEN

In any high-throughput scientific study, it is often essential to estimate the percent of findings that are actually incorrect. This percentage is called the false discovery rate (abbreviated "FDR"), and it is an invariant (albeit, often unknown) quantity for any well-formed study. In proteomics, it has become common practice to incorrectly conflate the protein FDR (the percent of identified proteins that are actually absent) with protein-level target-decoy, a particular method for estimating the protein-level FDR. In this manner, the challenges of one approach have been used as the basis for an argument that the field should abstain from protein-level FDR analysis altogether or even the suggestion that the very notion of a protein FDR is flawed. As we demonstrate in simple but accurate simulations, not only is the protein-level FDR an invariant concept, when analyzing large data sets, the failure to properly acknowledge it or to correct for multiple testing can result in large, unrecognized errors, whereby thousands of absent proteins (and, potentially every protein in the FASTA database being considered) can be incorrectly identified.


Asunto(s)
Artefactos , Proteínas/aislamiento & purificación , Proteómica/estadística & datos numéricos , Programas Informáticos , Espectrometría de Masas en Tándem/estadística & datos numéricos , Algoritmos , Bases de Datos de Proteínas , Humanos , Proteómica/métodos
17.
J Proteomics ; 129: 25-32, 2015 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-26196237

RESUMEN

Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics.


Asunto(s)
Algoritmos , Mapeo Peptídico/métodos , Proteoma/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Secuencia de Aminoácidos , Espectrometría de Masas/métodos , Datos de Secuencia Molecular
18.
Neuroimage ; 108: 251-64, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25498390

RESUMEN

Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.


Asunto(s)
Algoritmos , Encéfalo/crecimiento & desarrollo , Conectoma/métodos , Modelos Neurológicos , Vías Nerviosas/crecimiento & desarrollo , Adolescente , Niño , Femenino , Humanos , Masculino
19.
Technometrics ; 55(2): 150-160, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23908557

RESUMEN

Quantitative high throughput screening (qHTS) assays use cells or tissues to screen thousands of compounds in a short period of time. Data generated from qHTS assays are then evaluated using nonlinear regression models, such as the Hill model, and decisions regarding toxicity are made using the estimates of the parameters of the model. For any given compound, the variability in the observed response may either be constant across dose groups (homoscedasticity) or vary with dose (heteroscedasticity). Since thousands of compounds are simultaneously evaluated in a qHTS assay, it is not practically feasible for an investigator to perform residual analysis to determine the variance structure before performing statistical inferences on each compound. Since it is well-known that the variance structure plays an important role in the analysis of linear and nonlinear regression models it is therefore important to have practically useful and easy to interpret methodology which is robust to the variance structure. Furthermore, given the number of chemicals that are investigated in the qHTS assay, outliers and influential observations are not uncommon. In this article we describe preliminary test estimation (PTE) based methodology which is robust to the variance structure as well as any potential outliers and influential observations. Performance of the proposed methodology is evaluated in terms of false discovery rate (FDR) and power using a simulation study mimicking a real qHTS data. Of the two methods currently in use, our simulations studies suggest that one is extremely conservative with very small power in comparison to the proposed PTE based method whereas the other method is very liberal. In contrast, the proposed PTE based methodology achieves a better control of FDR while maintaining good power. The proposed methodology is illustrated using a data set obtained from the National Toxicology Program (NTP). Additional information, simulation results, data and computer code are available online as supplementary materials.

20.
Neuroimage ; 80: 416-25, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23631992

RESUMEN

Brain connectivity can be represented by a network that enables the comparison of the different patterns of structural and functional connectivity among individuals. In the literature, two levels of statistical analysis have been considered in comparing brain connectivity across groups and subjects: 1) the global comparison where a single measure that summarizes the information of each brain is used in a statistical test; 2) the local analysis where a single test is performed either for each node/connection which implies a multiplicity correction, or for each group of nodes/connections where each subset is summarized by one single test in order to reduce the number of tests to avoid a penalizing multiplicity correction. We comment on the different levels of analysis and present some methods that have been proposed at each scale. We highlight as well the possible factors that could influence the statistical results and the questions that have to be addressed in such an analysis.


Asunto(s)
Encéfalo/fisiología , Conectoma/métodos , Modelos Anatómicos , Modelos Neurológicos , Modelos Estadísticos , Red Nerviosa/fisiología , Animales , Interpretación Estadística de Datos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA