Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
J Cereb Blood Flow Metab ; 44(9): 1608-1617, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38483126

RESUMEN

A metabolic coupling between glutamate and N-acetylaspartate measured by in vivo magnetic resonance spectroscopy has been recently reported in the literature with inconsistent findings. In this study, confounders originating from Pearson's spurious correlation of ratios and spectral correlation due to overlapping magnetic resonance spectroscopy signals of glutamate and N-acetylaspartate were practically eliminated to facilitate the determination of any metabolic link between glutamate and N-acetylaspartate in the human brain using in vivo magnetic resonance spectroscopy. In both occipital and medial prefrontal cortices of healthy individuals, correlations between glutamate and N-acetylaspartate were found to be insignificant. Our results do not lend support to a recent hypothesis that N-acetylaspartate serves as a significant reservoir for the rapid replenishment of glutamate during signaling or stress.


Asunto(s)
Ácido Aspártico , Ácido Glutámico , Espectroscopía de Resonancia Magnética , Humanos , Ácido Aspártico/análogos & derivados , Ácido Aspártico/metabolismo , Ácido Glutámico/metabolismo , Masculino , Adulto , Femenino , Espectroscopía de Resonancia Magnética/métodos , Encéfalo/metabolismo , Corteza Prefrontal/metabolismo , Adulto Joven
2.
J Comput Biol ; 30(11): 1240-1245, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37988394

RESUMEN

Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.


Asunto(s)
Lenguajes de Programación , Programas Informáticos , Proteínas , Descubrimiento de Drogas
3.
J Comput Biol ; 30(11): 1226-1239, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37988395

RESUMEN

Statistical models that accurately predict the binding affinity of an input ligand-protein pair can greatly accelerate drug discovery. Such models are trained on available ligand-protein interaction data sets, which may contain biases that lead the predictor models to learn data set-specific, spurious patterns instead of generalizable relationships. This leads the prediction performances of these models to drop dramatically for previously unseen biomolecules. Various approaches that aim to improve model generalizability either have limited applicability or introduce the risk of degrading overall prediction performance. In this article, we present DebiasedDTA, a novel training framework for drug-target affinity (DTA) prediction models that addresses data set biases to improve the generalizability of such models. DebiasedDTA relies on reweighting the training samples to achieve robust generalization, and is thus applicable to most DTA prediction models. Extensive experiments with different biomolecule representations, model architectures, and data sets demonstrate that DebiasedDTA achieves improved generalizability in predicting drug-target affinities.


Asunto(s)
Modelos Estadísticos , Proteínas , Ligandos , Proteínas/química , Descubrimiento de Drogas
4.
Ann Bot ; 131(4): 555-568, 2023 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-36794962

RESUMEN

BACKGROUND: Relative growth rate (RGR) has a long history of use in biology. In its logged form, RGR = ln[(M + ΔM)/M], where M is size of the organism at the commencement of the study, and ΔM is new growth over time interval Δt. It illustrates the general problem of comparing non-independent (confounded) variables, e.g. (X + Y) vs. X. Thus, RGR depends on what starting M(X) is used even within the same growth phase. Equally, RGR lacks independence from its derived components, net assimilation rate (NAR) and leaf mass ratio (LMR), as RGR = NAR × LMR, so that they cannot legitimately be compared by standard regression or correlation analysis. FINDINGS: The mathematical properties of RGR exemplify the general problem of 'spurious' correlations that compare expressions derived from various combinations of the same component terms X and Y. This is particularly acute when X >> Y, the variance of X or Y is large, or there is little range overlap of X and Y values among datasets being compared. Relationships (direction, curvilinearity) between such confounded variables are essentially predetermined and so should not be reported as if they are a finding of the study. Standardizing by M rather than time does not solve the problem. We propose the inherent growth rate (IGR), lnΔM/lnM, as a simple, robust alternative to RGR that is independent of M within the same growth phase. CONCLUSIONS: Although the preferred alternative is to avoid the practice altogether, we discuss cases where comparing expressions with components in common may still have utility. These may provide insights if (1) the regression slope between pairs yields a new variable of biological interest, (2) the statistical significance of the relationship remains supported using suitable methods, such as our specially devised randomization test, or (3) multiple datasets are compared and found to be statistically different. Distinguishing true biological relationships from spurious ones, which arise from comparing non-independent expressions, is essential when dealing with derived variables associated with plant growth analyses.


Asunto(s)
Desarrollo de la Planta , Hojas de la Planta
5.
Ecology ; 104(3): e3954, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36495236

RESUMEN

Historical resurveys of ecological communities are important for placing the structure of modern ecosystems in context. Rarely, however, are snapshot surveys alone sufficient for providing direct insight into the rates of the ecological processes underlying community functioning, either now or in the past. In this study, I used a statistically reasoned observational approach to estimate the feeding rates of a New Zealand intertidal predator, Haustrum haustorium, using diet surveys performed at several sites by Robert Paine in 1968-1969 and by me in 2004. Comparisons between time periods reveal a remarkable consistency in the predator's prey-specific feeding rates, which contrasts with the changes I observed in prey abundances, the predator's body-size distribution, and the prey's proportional contributions to the predator's apparent diet. Although these and additional changes in the predator's per-capita attack rates seem to show adaptive changes in its prey preferences, they do not. Rather, feeding-rate stability is an inherently statistical consequence of the predator's high among-prey variation in handling times which determine the length of time that feeding events will remain detectable to observers performing diet surveys. Though understudied, similarly high among-prey variation in handling (or digestion) times is evident in many predator species throughout the animal kingdom. The resultant disconnect between a predator's apparent diet and its actual feeding rates suggests that much of the temporal, biogeographic, and seemingly context-dependent variation that is often perceived in community structure, predator diets, and food-web topology may be of less functional consequence than assumed. Qualitative changes in ecological pattern need not represent qualitative changes in ecological process.


Asunto(s)
Ecosistema , Conducta Predatoria , Animales , Modelos Biológicos , Cadena Alimentaria , Tamaño Corporal
6.
Indoor Air ; 32(1): e12924, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34418165

RESUMEN

Trends in the elemental composition of fine particulate matter (PM2.5 ) collected from indoor, outdoor, and personal microenvironments were investigated using two metrics: ng/m3 and mg/kg. Pearson correlations that were positive using one metric commonly disappeared or flipped to become negative when the other metric was applied to the same dataset. For example, the correlation between Mo and S in the outdoor microenvironment was positive using ng/m3 (p < 0.05) but negative using mg/kg (p < 0.05). In general, elemental concentrations (mg/kg) within PM2.5 decreased significantly (p < 0.05) as PM2.5 concentrations (µg/m3 ) increased-a dilution effect that was observed in all microenvironments and seasons. An exception was S: in the outdoor microenvironment, the correlation between wt% S and PM2.5  flipped from negative in the winter (p < 0.01) to positive (p < 0.01) in the summer, whereas in the indoor microenvironment, this correlation was negative year-round (p < 0.05). Correlation analyses using mg/kg indicated that elemental associations may arise from Fe-Mn oxyhydroxide sorption processes that occur as particles age, with or without the presence of a common anthropogenic source. Application of mass-normalized concentration metrics (mg/kg or wt%), enabled by careful gravimetric analysis, revealed new evidence of the importance of indoor sources of elements in PM2.5 .


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire Interior , Contaminantes Atmosféricos/análisis , Contaminación del Aire Interior/análisis , Monitoreo del Ambiente , Metales/análisis , Tamaño de la Partícula , Material Particulado/análisis , Estaciones del Año
7.
J Health Econ ; 77: 102452, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33845407

RESUMEN

The milk addiction paradox refers to an empirical finding in which consumption of non-addictive commodities such as milk appears to be consistent with the theory of rational addiction. This paradoxical result seems more likely when consumption is persistent and with aggregate data. Using both simulated and real data, we show that the milk addiction paradox disappears when estimating the data using an AR(1) linear specification that describes the saddle-path solution of the rational addiction model, instead of the canonical AR(2) model. The AR(1) specification is able to correctly discriminate between rational addiction and simple persistence in the data, to test for the main features of rational addiction, and to produce unbiased estimates of the short and long-run elasticity of demand. These results hold both with individual and aggregated data, and they imply that the AR(1) model is a better empirical alternative for testing rational addiction than the canonical AR(2) model.


Asunto(s)
Conducta Adictiva , Leche , Animales , Conducta Adictiva/epidemiología , Humanos
8.
Environ Geochem Health ; 43(2): 949-969, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-32588160

RESUMEN

Despite some researches indicating the possibility of correlation being induced by the common variable effect, correlation plots of ionic ratio (Na+/Cl-) versus ionic concentration (Cl-) still remain popular for interpreting the causes of groundwater salinization. There were doubts about relevance of spurious correlation in groundwater and its detection using the randomization process, owing to the fact that groundwater is charge-balanced and randomization would result in abnormal ionic ratios. In this context, the relevance of spurious correlation and its detection using randomization of common variable was established in this study, which was missing from the literature. The study used qualitative and quantitative tools for detecting the possibility of induced correlation and demonstrated the efficiency of the proposed method using published datasets from a variety of geochemical processes of groundwater salinization. In five out of the eight cases examined, the correlations observed in the plots appeared to be induced by the common variable effect and, as such, were deemed unreliable as positive indicators of the stated salinization processes. Even when the correlations appear not to be induced, it is recommended to always support the inferences with other independent evidence(s).


Asunto(s)
Monitoreo del Ambiente/métodos , Agua Subterránea/química , Iones/análisis , Contaminantes Químicos del Agua/análisis , Cloruros/análisis , Salinidad , Sodio/análisis
9.
J Econom ; 215(1): 118-130, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32773919

RESUMEN

This paper develops a new estimation procedure for ultrahigh dimensional sparse precision matrix, the inverse of covariance matrix. Regularization methods have been proposed for sparse precision matrix estimation, but they may not perform well with ultrahigh dimensional data due to the spurious correlation. We propose a refitted cross validation (RCV) method for sparse precision matrix estimation based on its Cholesky decomposition, which does not require the Gaussian assumption. The proposed RCV procedure can be easily implemented with existing software for ultrahigh dimensional linear regression. We establish the consistency of the proposed RCV estimation and show that the rate of convergence of the RCV estimation without assuming banded structure is the same as that of those assuming the banded structure in Bickel and Levina (2008b). Monte Carlo studies were conducted to access the finite sample performance of the RCV estimation. Our numerical comparison shows that the RCV estimation outperforms the existing ones in various scenarios. We further apply the RCV estimation for an empirical analysis of asset allocation.

10.
Philos Trans R Soc Lond B Biol Sci ; 375(1797): 20190364, 2020 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-32146883

RESUMEN

The Price equation embodies the 'conditions approach' to evolution in which the Darwinian conditions of heritable variation in fitness are represented in equation form. The equation can be applied recursively, leading to a partition of selection at the group and individual levels. After reviewing the well-known issues with the Price partition, as well as issues with a partition based on contextual analysis, we summarize a partition of group and individual selection based on counterfactual fitness, the fitness that grouped cells would have were they solitary. To understand 'group selection' in multi-level selection models, we assume that only group selection can make cells suboptimal when they are removed from the group. Our analyses suggest that there are at least three kinds of selection that can be occurring at the same time: group-specific selection along with two kinds of individual selection, within-group selection and global individual selection. Analyses based on counterfactual fitness allow us to specify how close a group is to being a pseudo-group, and this can be a basis for quantifying progression through an evolutionary transition in individuality (ETI). During an ETI, fitnesses at the two levels, group and individual, become decoupled, in the sense that fitness in a group may be quite high, even as counterfactual fitness goes to zero. This article is part of the theme issue 'Fifty years of the Price equation'.


Asunto(s)
Evolución Biológica , Aptitud Genética , Modelos Genéticos , Selección Genética , Variación Biológica Individual , Genética de Población/métodos
11.
Hum Vaccin Immunother ; 15(10): 2501-2502, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30829122

RESUMEN

A possible spurious correlation was found between human papillomavirus (HPV) vaccination introduction and birth rate change in the United States. Thus, the effects of HPV vaccination needed to be followed carefully at an international level. The birth rate change in the US might be representative of the trend of the introduction of new contraception methods and advancing maternal age.


Asunto(s)
Tasa de Natalidad/tendencias , Programas de Inmunización , Infecciones por Papillomavirus/prevención & control , Vacunas contra Papillomavirus/administración & dosificación , Vacunación/estadística & datos numéricos , Adolescente , Adulto , Niño , Femenino , Humanos , Padres , Aceptación de la Atención de Salud , Estados Unidos , Adulto Joven
12.
Eur J Soc Psychol ; 48(7): 970-989, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30555189

RESUMEN

Previous research indicates that conspiracy thinking is informed by the psychological imposition of order and meaning on the environment, including the perception of causal relations between random events. Four studies indicate that conspiracy belief is driven by readiness to draw implausible causal connections even when events are not random, but instead conform to an objective pattern. Study 1 (N = 195) showed that conspiracy belief was related to the causal interpretation of real-life, spurious correlations (e.g., between chocolate consumption and Nobel prizes). In Study 2 (N = 216), this effect held adjusting for correlates including magical and non-analytical thinking. Study 3 (N = 214) showed that preference for conspiracy explanations was associated with the perception that a focal event (e.g., the death of a journalist) was causally connected to similar, recent events. Study 4 (N = 211) showed that conspiracy explanations for human tragedies were favored when they comprised part of a cluster of similar events (vs. occurring in isolation); crucially, they were independently increased by a manipulation of causal perception. We discuss the implications of these findings for previous, mixed findings in the literature and for the relation between conspiracy thinking and other cognitive processes.

13.
Ann Stat ; 46(3): 989-1017, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-29942099

RESUMEN

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable Y with the best s linear combinations of p covariates X, even when X and Y are independent. When the covariance matrix of X possesses the restricted eigenvalue property, we derive such distributions for both finite s and diverging s, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of X. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where residuals are from regularized fits. Our approach is then applied to construct the upper confidence limit for the maximum spurious correlation and testing exogeneity of covariates. The former provides a baseline for guarding against false discoveries due to data mining and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated by both numerical examples and real data analysis.

14.
Neuroimage ; 173: 610-622, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29378318

RESUMEN

Inter-areal functional connectivity (FC), neuronal synchronization in particular, is thought to constitute a key systems-level mechanism for coordination of neuronal processing and communication between brain regions. Evidence to support this hypothesis has been gained largely using invasive electrophysiological approaches. In humans, neuronal activity can be non-invasively recorded only with magneto- and electroencephalography (MEG/EEG), which have been used to assess FC networks with high temporal resolution and whole-scalp coverage. However, even in source-reconstructed MEG/EEG data, signal mixing, or "source leakage", is a significant confounder for FC analyses and network localization. Signal mixing leads to two distinct kinds of false-positive observations: artificial interactions (AI) caused directly by mixing and spurious interactions (SI) arising indirectly from the spread of signals from true interacting sources to nearby false loci. To date, several interaction metrics have been developed to solve the AI problem, but the SI problem has remained largely intractable in MEG/EEG all-to-all source connectivity studies. Here, we advance a novel approach for correcting SIs in FC analyses using source-reconstructed MEG/EEG data. Our approach is to bundle observed FC connections into hyperedges by their adjacency in signal mixing. Using realistic simulations, we show here that bundling yields hyperedges with good separability of true positives and little loss in the true positive rate. Hyperedge bundling thus significantly decreases graph noise by minimizing the false-positive to true-positive ratio. Finally, we demonstrate the advantage of edge bundling in the visualization of large-scale cortical networks with real MEG data. We propose that hypergraphs yielded by bundling represent well the set of true cortical interactions that are detectable and dissociable in MEG/EEG connectivity analysis.


Asunto(s)
Encéfalo/fisiología , Electroencefalografía/métodos , Magnetoencefalografía/métodos , Red Nerviosa/fisiología , Procesamiento de Señales Asistido por Computador , Mapeo Encefálico/métodos , Simulación por Computador , Humanos , Modelos Neurológicos
15.
J Proteome Res ; 16(2): 619-634, 2017 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-27977202

RESUMEN

Normalization is a fundamental step in data processing to account for the sample-to-sample variation observed in biological samples. However, data structure is affected by normalization. In this paper, we show how, and to what extent, the correlation structure is affected by the application of 11 different normalization procedures. We also discuss the consequences for data analysis and interpretation, including principal component analysis, partial least-squares discrimination, and the inference of metabolite-metabolite association networks.


Asunto(s)
Metaboloma/genética , Análisis de Componente Principal , Proteoma/normas , Proteómica/estadística & datos numéricos , Animales , Análisis de los Mínimos Cuadrados , Proteoma/química , Proteoma/genética , Proteómica/normas , Porcinos , Orina/química
16.
Theory Biosci ; 135(1-2): 21-36, 2016 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26762323

RESUMEN

Correlation is ubiquitously used in gene expression analysis although its validity as an objective criterion is often questionable. If no normalization reflecting the original mRNA counts in the cells is available, correlation between genes becomes spurious. Yet the need for normalization can be bypassed using a relative analysis approach called log-ratio analysis. This approach can be used to identify proportional gene pairs, i.e. a subset of pairs whose correlation can be inferred correctly from unnormalized data due to their vanishing log-ratio variance. To interpret the size of non-zero log-ratio variances, a proposal for a scaling with respect to the variance of one member of the gene pair was recently made by Lovell et al. Here we derive analytically how spurious proportionality is introduced when using a scaling. We base our analysis on a symmetric proportionality coefficient (briefly mentioned in Lovell et al.) that has a number of advantages over their statistic. We show in detail how the choice of reference needed for the scaling determines which gene pairs are identified as proportional. We demonstrate that using an unchanged gene as a reference has huge advantages in terms of sensitivity. We also explore the link between proportionality and partial correlation and derive expressions for a partial proportionality coefficient. A brief data-analysis part puts the discussed concepts into practice.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Schizosaccharomyces/genética , Redes Reguladoras de Genes , Genes Fúngicos , Análisis de los Mínimos Cuadrados , Modelos Biológicos , Modelos Estadísticos , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN , Procesos Estocásticos
17.
Artículo en Inglés | MEDLINE | ID: mdl-28936128

RESUMEN

Many data-mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and L1-regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only candidate models with goodness of fits better than those by spurious fits. The theory and method are convincingly illustrated by simulated examples and an application to the binary outcomes from German Neuroblastoma Trials.

18.
Natl Sci Rev ; 1(2): 293-314, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25419469

RESUMEN

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

19.
Perspect Psychol Sci ; 6(2): 183-91, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26162137

RESUMEN

Paradigm-oriented research strategies in experimental psychology have strengths and limitations. On the one hand, experimental paradigms play a crucial epistemic and heuristic role in basic psychological research. On the other hand, empirical research is often limited to the observed effects in a certain paradigm, and theoretical models are frequently tied to the particular features of the given paradigm. A paradigm-driven research strategy therefore jeopardizes the pursuit of research questions and theoretical models that go beyond a specific paradigm. As one example of a more integrative approach, recent research on illusory and spurious correlations has attempted to overcome the limitations of paradigm-specific models in the context of biased contingency perception and social stereotyping. Last but not least, the use of statistical models for the analysis of elementary cognitive functions is a means toward a more integrative terminology and theoretical perspective across different experimental paradigms and research domains.

20.
Oecologia ; 86(1): 147-151, 1991 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28313173

RESUMEN

Ecologists often 'standardize' data through the use of ratios and indices. Such measures are employed generally to remove a 'size effect' induced by some relatively uniteresting variable. The implications of using the resultant data in correlation and regression analyses are poorly recognized. We show that ratios and indices often provide surprising and 'spurious' results due to their unusual properties. As a solution, we advocate the use of randomization tests to evaluate hypotheses confounded by 'spurious' correlations. In addition, we emphasize that identifying the appropriate null correlation is of utmost importance when statistically evaluating ratios, although this issue is frequently ignored.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA