RESUMO
We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of the L components of a mixture model which incorporates scalar and functional covariates. This process can be thought as a hierarchical model with the first level modelling a scalar response according to a mixture of parametric distributions and the second level modelling the mixture probabilities by means of a generalized linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality, since functional covariates can be measured at very small intervals leading to a highly parametrized model, but also does not take into account the nature of the data. We use basis expansions to reduce the dimensionality and a Bayesian approach for estimating the parameters while providing predictions of the latent classification vector. The method is motivated by two data examples that are not easily handled by existing methods. The first example concerns identifying placebo responders on a clinical trial (normal mixture model) and the other predicting illness for milking cows (zero-inflated mixture of the Poisson model).
RESUMO
Introduction: Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods: When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion: We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.
RESUMO
Leptospirosis, the infectious disease caused by a spirochete bacteria, is a major public health problem worldwide. In Argentina, some regions have climatic and geographical characteristics that favor the habitat of bacteria of the Leptospira genus, whose survival strongly depends on climatic factors, enhanced by climate change, which increase the problems associated with people's health. In order to have a method to predict leptospirosis cases, in this paper, five time series forecasting methods are compared: two parametric (autoregressive integrated moving average and an alternative one that allows covariates, ARIMA and ARIMAX, respectively), two nonparametric (Nadaraya-Watson Kernel estimator, one and two kernels versions, NW-1 K and NW-2 K), and one semiparametric (semi-functional partial linear regression, SFPLR) method. For this, the number of cases of leptospirosis registered from 2009 to 2020 in three important cities of northeastern Argentina is used, as well as hydroclimatic covariates related to the presence of cases. According to the obtained results, there is no method that improves considerably the rest and can be recommended as a unique tool for leptospirosis prediction. However, in general, the NW-2 K method gets a better performance. This work, in addition to using a long-term high-quality time series, enriches the area of applications of statistical models to epidemiological leptospirosis data by the incorporation of hydroclimatic variables, and it is recommended directing further efforts in this line of research, under the context of current climate change.
Assuntos
Leptospirose , Humanos , Incidência , Estações do Ano , Leptospirose/epidemiologia , Leptospirose/microbiologia , Modelos Estatísticos , Surtos de DoençasRESUMO
This article presents data on carbon (C), nitrogen (N), and sulfur (S) isotopic composition of human hair collected throughout Mexico. The recorded values ranged from -18.3 to -12.8 for δ13C, 6.8 to 10.8 for δ15N and from 2.7 to 8.0 for δ34S. The socioeconomic covariates explored in this study showed, in part strong correlations with the recorded isotope values. Furthermore, these three isotope systems provide records of the dietary preferences and practices and also showed some spatial variation. This study detected geospatial patterning in the δ13C values of hair samples from Mexico as well as significant correlations with socioeconomic factors. No geospatial variation was detected in the δ15N and δ34S values, however, socioeconomic correlations were found. A δ13C isoscape was generated using a GIS approach, which provides a tool to narrow down region-of-origin predictions (in combination with other isotope systems) and to document the travel history of unidentified individuals.
Assuntos
Isótopos de Carbono/análise , Cabelo/química , Isótopos de Nitrogênio/análise , Isótopos de Enxofre/análise , Sistemas de Informação Geográfica , Geografia , Humanos , México , Áreas de Pobreza , Chuva , Fatores Socioeconômicos , Temperatura , Topografia MédicaRESUMO
Arsenic accumulation in the environment poses ecological and human health risks. A greater knowledge about soil total As content variability and its main drivers is strategic for maintaining soil security, helping public policies and environmental surveys. Considering the poor history of As studies in Brazil at the country's geographical scale, this work aimed to generate predictive models of topsoil As content using machine learning (ML) algorithms based on several environmental covariables representing soil forming factors, ranking their importance as explanatory covariables and for feeding group analysis. An unprecedented databank based on laboratory analyses (including rare earth elements), proximal and remote sensing, geographical information system operations, and pedological information were surveyed. The median soil As content ranged from 0.14 to 41.1 mg kg-1 in reference soils, and 0.28 to 58.3 mg kg-1 in agricultural soils. Recursive Feature Elimination Random Forest outperformed other ML algorithms, ranking as most important environmental covariables: temperature, soil organic carbon (SOC), clay, sand, and TiO2. Four natural groups were statistically suggested (As content ± standard error in mg kg-1): G1) with coarser texture, lower SOC, higher temperatures, and the lowest TiO2 contents, has the lowest As content (2.24 ± 0.50), accomplishing different environmental conditions; G2) organic soils located in floodplains, medium TiO2 and temperature, whose As content (3.78 ± 2.05) is slightly higher than G1, but lower than G3 and G4; G3) medium contents of As (7.14 ± 1.30), texture, SOC, TiO2, and temperature, representing the largest number of points widespread throughout Brazil; G4) the largest contents of As (11.97 ± 1.62), SOC, and TiO2, and the lowest sand content, with points located mainly across Southeastern Brazil with milder temperature. In the absence of soil As content, a common scenario in Brazil and in many Latin American countries, such natural groups could work as environmental indicators.
RESUMO
Many important complex diseases are composed of a series of phenotypes, which makes the disease diagnosis and its genetic dissection difficult. The standard procedures to determine heritability in such complex diseases are either applied for single phenotype analyses or to compare findings across phenotypes or multidimensional reduction procedures, such as principal components analysis using all phenotypes. However each method has its own problems and the challenges are even more complex for extended family data and categorical phenotypes. In this paper, we propose a methodology to determine a scale for complex outcomes involving multiple categorical phenotypes in extended pedigrees using item response theory (IRT) models that take all categorical phenotypes into account, allowing informative comparison among individuals. An advantage of the IRT framework is that a straightforward joint heritability parameter can be estimated for categorical phenotypes. Furthermore, our methodology allows many possible extensions such as the inclusion of covariates and multiple variance components. We use Markov Chain Monte Carlo algorithm for the parameter estimation and validate our method through simulated data. As an application we consider the metabolic syndrome as the multiple phenotype disease using data from the Baependi Heart Study consisting of 1,696 individuals in 95 families. We adjust IRT models without covariates and include age and age squared as covariates. The results showed that adjusting for covariates yields a higher joint heritability (h2=0.53) than without co variates (h2=0.21) indicating that the covariates absorbed some of the error variance.
Assuntos
Doença/genética , Modelos Genéticos , Fenótipo , Característica Quantitativa Herdável , Fatores Etários , Algoritmos , Humanos , Cadeias de Markov , Síndrome Metabólica/genética , Método de Monte Carlo , LinhagemRESUMO
Along the Brazilian coast only two haul-outs of South American sea lions (Otaria flavescens) are known: Ilha dos Lobos and Molhe Leste, both located in the southernmost state of Brazil, Rio Grande do Sul. Most sea lions observed in these haul-outs are adult and sub-adult males. It is supposed that the species' presence in these areas is due to food supply and absence of parental assistance by males. This study analysed the use of these haul-outs by O. flavescens between 1993 and 2002 based on counting data of observed individuals. Bayesian generalised linear mixed models were used to evaluate differences in abundance between areas, long term trends and seasonal patterns. Results showed that for O. flavescens abundance had a long term trend of increased average occupancy over the study period, with seasonal variation reaching the highest within-year value in August (Ilha dos Lobos) and October (Molhe Leste). The novel application of this powerful statistical modelling approach resulted in a useful tool to quantify occupancy dynamic.(AU)
Ao longo da costa do Brasil apenas duas colônias não-reprodutivas de leões-marinhos-do-sul (Otaria flavescens) são conhecidas: Ilha dos Lobos e Molhe Leste, ambas localizadas no estado do Rio Grande do Sul. A maioria dos leões-marinhos observados nestas colônias são machos adultos e sub-adultos. Supõe-se que a presença da espécie nestas áreas está relacionada ao forrageamento e a ausência de cuidado parental pelos machos. Este estudo analisou a dinâmica de ocupação e abundância de O. flavescens nas colônias não-reprodutivas entre 1993 e 2002, baseado em uma série temporal de dados de contagens de indivíduos. Modelos lineares generalizados mistos Bayesianos foram usados para avaliar diferença na abundância entre áreas, tendência de uso em longo prazo e padrões sazonais. Os resultados mostram que a abundância de O. flavescens variou sazonalmente, atingindo pico intra-anual em agosto (Ilha dos Lobos) e outubro (Molhe Leste), acompanhado de um aumento da ocupação média dos refúgios ao longo do período de estudo. A nova aplicação desta poderosa forma de modelagem estatística mostrou-se útil para quantificar a dinâmica de ocupação.(AU)
Assuntos
Animais , Leões-Marinhos/classificação , Distribuição Animal , Modelos Lineares , BrasilRESUMO
Along the Brazilian coast only two haul-outs of South American sea lions (Otaria flavescens) are known: Ilha dos Lobos and Molhe Leste, both located in the southernmost state of Brazil, Rio Grande do Sul. Most sea lions observed in these haul-outs are adult and sub-adult males. It is supposed that the species' presence in these areas is due to food supply and absence of parental assistance by males. This study analysed the use of these haul-outs by O. flavescens between 1993 and 2002 based on counting data of observed individuals. Bayesian generalised linear mixed models were used to evaluate differences in abundance between areas, long term trends and seasonal patterns. Results showed that for O. flavescens abundance had a long term trend of increased average occupancy over the study period, with seasonal variation reaching the highest within-year value in August (Ilha dos Lobos) and October (Molhe Leste). The novel application of this powerful statistical modelling approach resulted in a useful tool to quantify occupancy dynamic.
Ao longo da costa do Brasil apenas duas colônias não-reprodutivas de leões-marinhos-do-sul (Otaria flavescens) são conhecidas: Ilha dos Lobos e Molhe Leste, ambas localizadas no estado do Rio Grande do Sul. A maioria dos leões-marinhos observados nestas colônias são machos adultos e sub-adultos. Supõe-se que a presença da espécie nestas áreas está relacionada ao forrageamento e a ausência de cuidado parental pelos machos. Este estudo analisou a dinâmica de ocupação e abundância de O. flavescens nas colônias não-reprodutivas entre 1993 e 2002, baseado em uma série temporal de dados de contagens de indivíduos. Modelos lineares generalizados mistos Bayesianos foram usados para avaliar diferença na abundância entre áreas, tendência de uso em longo prazo e padrões sazonais. Os resultados mostram que a abundância de O. flavescens variou sazonalmente, atingindo pico intra-anual em agosto (Ilha dos Lobos) e outubro (Molhe Leste), acompanhado de um aumento da ocupação média dos refúgios ao longo do período de estudo. A nova aplicação desta poderosa forma de modelagem estatística mostrou-se útil para quantificar a dinâmica de ocupação.
Assuntos
Animais , Distribuição Animal , Leões-Marinhos/classificação , Modelos Lineares , BrasilRESUMO
Neonatal sepsis is common and is a major cause of morbidity and mortality. Vancomycin is the preferred treatment of several neonatal staphylococcal infections. The aim of this study was to review published data on vancomycin pharmacokinetics in neonates and to provide a critical analysis of the literature. A bibliographic search was performed using PubMed and Embase, and articles with a publication date of August 2011 or earlier were included in the analysis. Vancomycin pharmacokinetic estimates, which are different in neonates compared with adults, also exhibit extensive inter-neonatal variability. In neonates, several vancomycin dosing schedules have been proposed, mainly based on age (i.e., postmenstrual and postnatal), body weight or serum creatinine level. Other covariates [e.g., extracorporeal membrane oxygenation (ECMO), indomethacin or ibuprofen, and growth restriction] of vancomycin pharmacokinetics have been reported in neonates. Finally, vancomycin penetrates cerebrospinal fluid (range = 7-42%). Renal function drives vancomycin pharmacokinetics. Because either age or weight is the most relevant covariate of renal maturation, these covariates should be considered first in neonatal vancomycin dosing guidelines and further adjusted by renal dysfunction indicators (e.g., ECMO and ibuprofen/indomethacin). In addition to the prospective validation of available dosing guidelines, future studies should focus on the relevance of therapeutic drug monitoring and on the value of continuous vancomycin administration in neonates.
Assuntos
Humanos , Recém-Nascido , Antibacterianos/farmacocinética , Sepse/metabolismo , Vancomicina/farmacocinética , Fatores Etários , Antibacterianos/administração & dosagem , Monitoramento de Medicamentos , Doenças do Prematuro/tratamento farmacológico , Doenças do Prematuro/metabolismo , Rim/metabolismo , Sepse/tratamento farmacológico , Vancomicina/administração & dosagemRESUMO
BACKGROUND: Caribbean scholars continue to dichotomise self-reported health status without empirical justification for inclusion or exclusion of moderate health status in the dichotomisation of poor health. AIMS: This study will 1) evaluate which cut-off point should be used for self-reported health status; 2) assess whether dichotomisation of self-reported data should be practiced; 3) ascertain any disparity in dichotomisation by some covariates (i.e., marital status, age cohort, social class); and 4) examine the odds of reporting poor or moderate-to-very poor self-reported health status if one has an illness. MATERIALS AND METHODS: The current study used cross-sectional survey data for 2007. The survey used stratified probability sampling techniques to collect the data from Jamaicans. The sample consisted of 6,783 respondents, with a focus on participants aged 46+ years (n=1,583 respondents). Self-reported health status was a 5-item Likert scale question. The dichotomisation was poor health status or otherwise and poor (including moderate) self-reported health. Odds ratios were calculated in order to estimate the effect of the covariates. RESULT: When moderate self-reported health status was used in poor health status, the cut-off revealed moderate effect on specified covariates across the age cohorts for women. However, for men, exponential effects were used on social class, but not on area of residence or marital status across the different age cohorts. CONCLUSIONS: The cut-off point in the dichotomisation of self-reported health status does not make a difference for women and must be taken into consideration in the use of self-reported health data for Jamaica.