Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
PLoS One ; 18(5): e0277738, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37172042

RESUMEN

BACKGROUND: Malnutrition imposes enormous costs resulting from lost investments in human capital and increased healthcare expenditures. There is a dearth of research focusing on the prediction of women's body mass index (BMI) and malnutrition outcomes (underweight, overweight, and obesity) in developing countries. This paper attempts to fill out this knowledge gap by predicting the BMI and the risks of malnutrition outcomes for Bangladeshi women of childbearing age from their economic, health, and demographic features. METHODS: Data from the 2017-18 Bangladesh Demographic and Health Survey and a series of supervised machine learning (SML) techniques are used. Additionally, this study circumvents the imbalanced distribution problem in obesity classification by utilizing an oversampling approach. RESULTS: Study findings demonstrate that the support vector machine and k-nearest neighbor are the two best-performing methods in BMI prediction based on the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE). The combined predictor algorithms consistently yield top specificity, Cohen's kappa, F1-score, and AUC in classifying the malnutrition status, and their performance is robust to alternative standards. The feature importance ranking based on several nonparametric and combined predictors indicates that socioeconomic status, women's age, and breastfeeding status are the most important features in predicting women's nutritional outcomes. Furthermore, the conditional inference trees corroborate that those three features, along with the partner's educational attainment and employment status, significantly predict malnutrition risks. CONCLUSION: To the best of our knowledge, this is the first study that predicts BMI and one of the pioneer studies to classify all three malnutrition outcomes for women of childbearing age in Bangladesh, let alone in any lower-middle income country, using SML techniques. Moreover, in the context of Bangladesh, this paper is the first to identify and rank features that are critical in predicting nutritional outcomes using several feature selection algorithms. The estimators from this study predict the outcomes of interest most accurately and efficiently compared to other existing studies in the relevant literature. Therefore, study findings can aid policymakers in designing policy and programmatic approaches to address the double burden of malnutrition among Bangladeshi women, thereby reducing the country's economic burden.


Asunto(s)
Desnutrición , Estado Nutricional , Femenino , Humanos , Factores Socioeconómicos , Prevalencia , Obesidad/epidemiología , Desnutrición/diagnóstico , Desnutrición/epidemiología , Bangladesh/epidemiología , Delgadez , Aprendizaje Automático Supervisado
2.
Sci Rep ; 9(1): 11093, 2019 07 31.
Artículo en Inglés | MEDLINE | ID: mdl-31366961

RESUMEN

In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the original procedure. Genesis-indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel identifies 72,997 novel high-quality indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these indels shows significant enrichment of indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the indels overlap with the genes that do not have any indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies.


Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano/genética , Mutación/genética , Carcinogénesis/genética , Biología Computacional/métodos , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
3.
Hum Genomics ; 13(1): 9, 2019 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-30795817

RESUMEN

BACKGROUND: Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). RESULTS AND CONCLUSION: We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as "SNP," "Ins," "Del," and "Match") of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs.


Asunto(s)
Algoritmos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cadenas de Markov , Bases de Datos Genéticas , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Mutación INDEL , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple
4.
Sci Rep ; 7(1): 14106, 2017 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-29074871

RESUMEN

Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.

5.
BMC Genomics ; 17 Suppl 5: 496, 2016 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-27585593

RESUMEN

BACKGROUND: Insertions and Deletions (Indels) are the most common form of structural variation in human genome. Indels not only contribute to genetic diversity but also cause diseases. Therefore assessing indels in human genome has become an interesting topic to the research community. This increasing interest on indel calling research has resulted into the development of a good number of indel calling tools. However, all of these tools are command line based and require expertise from Computer Science (CS) to execute them which makes it challenging for researchers from non-CS background. METHODS: In this paper, we describe an interactive platform named SPAI which stands for Single Platform for Analyzing Indels. RESULTS: Being a Graphical User Interface (GUI) tool, SPAI facilitates users to run several popular indel calling tools and perform several analyses on the indel calling results without knowing any command line programming. CONCLUSIONS: SPAI is written in Java and tested in Linux operating system.


Asunto(s)
Análisis Mutacional de ADN , Mutación INDEL , Programas Informáticos , Cromosomas Humanos Par 22/genética , Genoma Humano , Humanos , Retroelementos
6.
Hum Genomics ; 9: 20, 2015 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-26286629

RESUMEN

BACKGROUND: Insertion and deletion (indel), a common form of genetic variation, has been shown to cause or contribute to human genetic diseases and cancer. With the advance of next-generation sequencing technology, many indel calling tools have been developed; however, evaluation and comparison of these tools using large-scale real data are still scant. Here we evaluated seven popular and publicly available indel calling tools, GATK Unified Genotyper, VarScan, Pindel, SAMtools, Dindel, GTAK HaplotypeCaller, and Platypus, using 78 human genome low-coverage data from the 1000 Genomes project. RESULTS: Comparing indels called by these tools with a known set of indels, we found that Platypus outperforms other tools. In addition, a high percentage of known indels still remain undetected and the number of common indels called by all seven tools is very low. CONCLUSION: All these findings indicate the necessity of improving the existing tools or developing new algorithms to achieve reliable and consistent indel calling results.


Asunto(s)
Mutación INDEL/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Proyecto Genoma Humano , Humanos
7.
Pathogens ; 3(1): 36-56, 2014 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-25437607

RESUMEN

High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms.

8.
Bioinformation ; 8(4): 203-5, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22419842

RESUMEN

UNLABELLED: Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.

9.
Bioinformation ; 7(6): 311-4, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22355228

RESUMEN

Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA