RESUMEN
Body-mass index (BMI) is a hallmark of adiposity. In contrast with adulthood, the genetic architecture of BMI during childhood is poorly understood. The few genome-wide association studies (GWAS) on children have been performed almost exclusively in Europeans and at single ages. We performed cross-sectional and longitudinal GWAS for BMI-related traits on 904 admixed children with mostly Mapuche Native American and European ancestries. We found regulatory variants of the immune gene HLA-DQB3 strongly associated with BMI at 1.5 - 2.5 years old. A variant in the sex-determining gene DMRT1 was associated with the age at adiposity rebound (Age-AR) in girls (P = 9.8 × 10 - 9 ). BMI was significantly higher in Mapuche than in Europeans between 5.5 and 16.5 years old. Finally, Age-AR was significantly lower (P = 0.004 ) by 1.94 years and BMI at AR was significantly higher (P = 0.04 ) by 1.2 kg/ m 2 , in Mapuche children compared with Europeans.
RESUMEN
High-throughput sequencing (HTS) methods are transforming our capacity to detect pathogens and perform disease diagnosis. Although sequencing advances have enabled accessible and point-of-care HTS, data analysis pipelines have yet to provide robust tools for precise and certain diagnosis, particularly in cases of low sequencing coverage. Lack of standardized metrics and harmonized detection thresholds confound the problem further, impeding the adoption and implementation of these solutions in real-world applications. In this work, we tackle these issues and propose biologically-informed viral genome assembly coverage as a method to improve diagnostic certainty. We use the identification of viral replicases, an essential function of viral life cycles, to define genome coverage thresholds in which biological functions can be described. We validate the analysis pipeline, Viroscope, using field samples, synthetic and published datasets, and demonstrate that it provides sensitive and specific viral detection. Furthermore, we developed Viroscope.io a web-service to provide on-demand HTS data viral diagnosis to facilitate adoption and implementation by phytosanitary agencies to enable precise viral diagnosis.
RESUMEN
Puberty is a complex developmental process that varies considerably among individuals and populations. Genetic factors explain a large proportion of the variability of several pubertal traits. Recent genome-wide association studies (GWAS) have identified hundreds of variants involved in traits that result from body growth, like adult height. However, they do not capture many genetic loci involved in growth changes over distinct growth phases. Further, such GWAS have been mostly performed in Europeans, but it is unknown how these findings relate to other continental populations. In this study, we analyzed the genetic basis of three pubertal traits; namely, peak height velocity (PV), age at PV (APV) and height at APV (HAPV). We analyzed a cohort of 904 admixed Chilean children and adolescents with European and Mapuche Native American ancestries. Height was measured on roughly a [Formula: see text]month basis from childhood to adolescence between 2006 and 2019. We predict that, in average, HAPV is 4.3 cm higher in European than in Mapuche adolescents (P = 0.042), and APV is 0.73 years later in European compared with Mapuche adolescents (P = 0.023). Further, by performing a GWAS on 774, 433 single-nucleotide polymorphisms, we identified a genetic signal harboring 3 linked variants significantly associated with PV in boys (P [Formula: see text]). This signal has never been associated with growth-related traits.
Asunto(s)
Indígenas Sudamericanos/genética , Pubertad/genética , Adolescente , Desarrollo del Adolescente , Adulto , Envejecimiento/genética , Estatura/genética , Chile , Estudios de Cohortes , Femenino , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Población Blanca/genéticaRESUMEN
Detection of positive selection signatures in populations around the world is helping to uncover recent human evolutionary history as well as the genetic basis of diseases. Most human evolutionary genomic studies have been performed in European, African, and Asian populations. However, populations with Native American ancestry have been largely underrepresented. Here, we used a genome-wide local ancestry enrichment approach complemented with neutral simulations to identify postadmixture adaptations underwent by admixed Chileans through gene flow from Europeans into local Native Americans. The top significant hits (P = 2.4×10-7) are variants in a region on chromosome 12 comprising multiple regulatory elements. This region includes rs12821256, which regulates the expression of KITLG, a well-known gene involved in lighter hair and skin pigmentation in Europeans as well as in thermogenesis. Another variant from that region is associated with the long noncoding RNA RP11-13A1.1, which has been specifically involved in the innate immune response against infectious pathogens. Our results suggest that these genes were relevant for adaptation in Chileans following the Columbian exchange.
Asunto(s)
Adaptación Biológica/genética , Cromosomas Humanos Par 12 , Genoma Humano , Pigmentación/genética , Selección Genética , Chile , Femenino , Flujo Génico , Haplotipos , Humanos , Hibridación Genética , Indígenas Sudamericanos/genética , Masculino , Termogénesis/genética , Población Blanca/genéticaRESUMEN
Currently, about 20 crystal structures per day are released and deposited in the Protein Data Bank. A significant fraction of these structures is produced by research groups associated with the structural genomics consortium. The biological function of many of these proteins is generally unknown or not validated by experiment. Therefore, a growing need for functional prediction of protein structures has emerged. Here we present an integrated bioinformatics method that combines sequence-based relationships and three-dimensional (3D) structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members of this family with known 3D structure but unknown function were identified for which a probable functional classification is provided. Our bioinformatics analyses suggest that they could be involved in plant cell wall degradation (Lin2118 protein from Listeria innocua, PDB code 3oou), symbiotic nitrogen fixation (protein from Chromobacterium violaceum, PDB code 3oio), and either metabolism of plant-derived biomass or nitrogen fixation (protein from Rhodopseudomonas palustris, PDB code 3mn2).
Asunto(s)
Factor de Transcripción de AraC/clasificación , Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , Factores de Transcripción/clasificación , Secuencia de Aminoácidos , Factor de Transcripción de AraC/química , Sitios de Unión , Análisis por Conglomerados , Bases de Datos de Proteínas , Modelos Moleculares , Modelos Estadísticos , Datos de Secuencia Molecular , Alineación de Secuencia , Factores de Transcripción/químicaAsunto(s)
Investigación , Chile , Humanos , Política Pública , Investigación/organización & administraciónRESUMEN
Transposable elements comprise a large proportion of animal genomes. Transposons can have detrimental effects on genome stability but also offer positive roles for genome evolution and gene expression regulation. Proper balance of the positive and deleterious effects of transposons is crucial for cell homeostasis and requires a mechanism that tightly regulates their expression. Herein we describe the expression of DNA transposons of the Tc1/mariner superfamily during Xenopus development. Sense and antisense transcripts containing complete Tc1-2_Xt were detected in Xenopus embryos. Both transcripts were found in zygotic stages and were mainly localized in Spemann's organizer and neural tissues. In addition, the Tc1-like elements Eagle, Froggy, Jumpy, Maya, Xeminos and TXr were also expressed in zygotic stages but not oocytes in X. tropicalis. Interestingly, although Tc1-2_Xt transcripts were not detected in Xenopus laevis embryos, transcripts from other two Tc1-like elements (TXr and TXz) presented a similar temporal and spatial pattern during X. laevis development. Deep sequencing analysis of Xenopus tropicalis gastrulae showed that PIWI-interacting RNAs (piRNAs) are specifically derived from several Tc1-like elements. The localized expression of Tc1-like elements in neural tissues suggests that they could play a role during the development of the Xenopus nervous system.
Asunto(s)
Elementos Transponibles de ADN/genética , Regulación del Desarrollo de la Expresión Génica , Sistema Nervioso/embriología , Sistema Nervioso/metabolismo , Xenopus/embriología , Xenopus/genética , Animales , Genoma/genética , ARN Interferente Pequeño/metabolismo , Cigoto/metabolismoRESUMEN
The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.
Asunto(s)
Proteínas de Unión al ADN/química , ADN/química , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Sitios de Unión , Cristalografía por Rayos X , ADN/clasificación , Proteínas de Unión al ADN/clasificaciónRESUMEN
BACKGROUND: As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art. RESULTS: In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system. CONCLUSION: A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.
Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Diagnóstico por Computador/métodos , Curva ROC , Programas InformáticosRESUMEN
We describe a web server for the accurate mapping of experimental tags in serial analysis of gene expression (SAGE). The core of the server relies on a database of genomic virtual tags built by a recently described method that attempts to reduce the amount of ambiguous assignments for those tags that are not unique in the genome. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. The output of the server consists of a table in HTML format that contains links to a graphic representation of the results and to some external servers and databases, facilitating the tasks of analysis of gene expression and gene discovery. Also, a table in tab delimited text format is produced, allowing the user to export the results into custom databases and software for further analysis. The current server version provides the most accurate and complete SAGE tag mapping source that is available for the yeast organism. In the near future, this server will also allow the accurate mapping of experimental SAGE-tags from other model organisms such as human, mouse, frog and fly. The server is freely available on the web at: http://dna.bio.puc.cl/SAGExplore.html.
Asunto(s)
Biología Computacional/métodos , Regulación Fúngica de la Expresión Génica , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Lugares Marcados de Secuencia , Programas Informáticos , Mapeo Cromosómico , ADN Complementario/genética , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Internet , ARN de Hongos/genética , ARN Mensajero/genética , ARN no Traducido/genéticaRESUMEN
An accurate and robust large-scale melting temperature prediction server for short DNA sequences is dispatched. The server calculates a consensus melting temperature value using the nearest-neighbor model based on three independent thermodynamic data tables. The consensus method gives an accurate prediction of melting temperature, as it has been recently demonstrated in a benchmark performed using all available experimental data for DNA sequences within the length range of 16-30 nt. This constitutes the first web server that has been implemented to perform a large-scale calculation of melting temperatures in real time (up to 5000 DNA sequences can be submitted in a single run). The expected accuracy of calculations carried out by this server in the range of 50-600 mM monovalent salt concentration is that 89% of the melting temperature predictions will have an error or deviation of <5 degrees C from experimental data. The server can be freely accessed at http://dna.bio.puc.cl/tm.html. The standalone executable versions of this software for LINUX, Macintosh and Windows platforms are also freely available at the same web site. Detailed further information supporting this server is available at the same web site referenced above.