Búsqueda | Portal Regional de la BVS

Mutual Information-Driven Feature Reduction for Hyperspectral Image Classification.

Islam, Md Rashedul; Ahmed, Boshir; Hossain, Md Ali; Uddin, Md Palash.

Sensors (Basel) ; 23(2)2023 Jan 06.

Artículo en Inglés | MEDLINE | ID: mdl-36679453

RESUMEN

A hyperspectral image (HSI), which contains a number of contiguous and narrow spectral wavelength bands, is a valuable source of data for ground cover examinations. Classification using the entire original HSI suffers from the "curse of dimensionality" problem because (i) the image bands are highly correlated both spectrally and spatially, (ii) not every band can carry equal information, (iii) there is a lack of enough training samples for some classes, and (iv) the overall computational cost is high. Therefore, effective feature (band) reduction is necessary through feature extraction (FE) and/or feature selection (FS) for improving the classification in a cost-effective manner. Principal component analysis (PCA) is a frequently adopted unsupervised FE method in HSI classification. Nevertheless, its performance worsens when the dataset is noisy, and the computational cost becomes high. Consequently, this study first proposed an efficient FE approach using a normalized mutual information (NMI)-based band grouping strategy, where the classical PCA was applied to each band subgroup for intrinsic FE. Finally, the subspace of the most effective features was generated by the NMI-based minimum redundancy and maximum relevance (mRMR) FS criteria. The subspace of features was then classified using the kernel support vector machine. Two real HSIs collected by the AVIRIS and HYDICE sensors were used in an experiment. The experimental results demonstrated that the proposed feature reduction approach significantly improved the classification performance. It achieved the highest overall classification accuracy of 94.93% for the AVIRIS dataset and 99.026% for the HYDICE dataset. Moreover, the proposed approach reduced the computational cost compared with the studied methods.

Asunto(s)

Máquina de Vectores de Soporte , Análisis de Componente Principal

PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection.

Sohrawordi, Md; Hossain, Md Ali; Hasan, Md Al Mehedi.

Brief Bioinform ; 23(5)2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-35929355

RESUMEN

A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.

Asunto(s)

Lisina , Máquina de Vectores de Soporte , Algoritmos , Secuencia de Aminoácidos , Biología Computacional/métodos , Humanos , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo

Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques.

Sohrawordi, Md; Hossain, Md Ali.

Biochimie ; 192: 125-135, 2022 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-34627982

RESUMEN

Lysine formylation is a newly discovered and mostly interested type of post-translational modification (PTM) that is generally found on core and linker histone proteins of prokaryote and eukaryote and plays various important roles on the regulation of various cellular mechanisms. Hence, it is very urgent to properly identify formylation site in protein for understanding the molecular mechanism of formylation deeply and defining drug for relevant diseases. As experimentally identification of formylation site using traditional processes are expensive and time consuming, a simple and high speedy mathematical model for predicting accurately lysine formylation sites is highly desired. A useful computational model named PLF_SVM is deigned and proposed in this study by using binary encoding (BE), amino acid composition (AAC), reverse position relative incidence matrix (RPRIM), position relative incidence matrix (PRIM), and position specific amino acid propensity (PSAAP) feature generation methods for predicting formylated and non-formylated lysine sites. Besides, the Synthetic Minority Oversampling Technique (SMOTE) and a proposed sample selection strategy named EnSVM are applied to handle the imbalance training dataset problem. Thereafter, the optimal number of features are selected by F-score method to train the model. Finally, it has been seen that PLF_SVM outperforms the state-of-the-art approaches in validation and independent test with an accuracy of 98.61% and 98.77% respectively. At https://plf-svm.herokuapp.com/, a user-friendly web tool is also created for identifying formylation sites. Therefore, the proposed method may be helpful guideline for the analysis and prediction of formylated lysine and knowing the process of cellular regulation.

Asunto(s)

Histonas/química , Lisina/química , Procesamiento Proteico-Postraduccional , Máquina de Vectores de Soporte , Acilación , Animales , Histonas/metabolismo , Humanos , Lisina/metabolismo

Network-Based Genetic Profiling Reveals Cellular Pathway Differences Between Follicular Thyroid Carcinoma and Follicular Thyroid Adenoma.

Hossain, Md Ali; Asa, Tania Akter; Rahman, Md Mijanur; Uddin, Shahadat; Moustafa, Ahmed A; Quinn, Julian M W; Moni, Mohammad Ali.

Int J Environ Res Public Health ; 17(4)2020 02 20.

Artículo en Inglés | MEDLINE | ID: mdl-32093341

RESUMEN

Molecular mechanisms underlying the pathogenesis and progression of malignant thyroid cancers, such as follicular thyroid carcinomas (FTCs), and how these differ from benign thyroid lesions, are poorly understood. In this study, we employed network-based integrative analyses of FTC and benign follicular thyroid adenoma (FTA) lesion transcriptomes to identify key genes and pathways that differ between them. We first analysed a microarray gene expression dataset (Gene Expression Omnibus GSE82208, n = 52) obtained from FTC and FTA tissues to identify differentially expressed genes (DEGs). Pathway analyses of these DEGs were then performed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) resources to identify potentially important pathways, and protein-protein interactions (PPIs) were examined to identify pathway hub genes. Our data analysis identified 598 DEGs, 133 genes with higher and 465 genes with lower expression in FTCs. We identified four significant pathways (one carbon pool by folate, p53 signalling, progesterone-mediated oocyte maturation signalling, and cell cycle pathways) connected to DEGs with high FTC expression; eight pathways were connected to DEGs with lower relative FTC expression. Ten GO groups were significantly connected with FTC-high expression DEGs and 80 with low-FTC expression DEGs. PPI analysis then identified 12 potential hub genes based on degree and betweenness centrality; namely, TOP2A, JUN, EGFR, CDK1, FOS, CDKN3, EZH2, TYMS, PBK, CDH1, UBE2C, and CCNB2. Moreover, transcription factors (TFs) were identified that may underlie gene expression differences observed between FTC and FTA, including FOXC1, GATA2, YY1, FOXL1, E2F1, NFIC, SRF, TFAP2A, HINFP, and CREB1. We also identified microRNA (miRNAs) that may also affect transcript levels of DEGs; these included hsa-mir-335-5p, -26b-5p, -124-3p, -16-5p, -192-5p, -1-3p, -17-5p, -92a-3p, -215-5p, and -20a-5p. Thus, our study identified DEGs, molecular pathways, TFs, and miRNAs that reflect molecular mechanisms that differ between FTC and benign FTA. Given the general similarities of these lesions and common tissue origin, some of these differences may reflect malignant progression potential, and include useful candidate biomarkers for FTC and identifying factors important for FTC pathogenesis.

Asunto(s)

Adenocarcinoma Folicular/genética , Adenoma/genética , Neoplasias de la Tiroides/genética , Adenocarcinoma Folicular/diagnóstico , Adenoma/diagnóstico , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Ontología de Genes , Redes Reguladoras de Genes , Humanos , MicroARNs/genética , Neoplasias de la Tiroides/diagnóstico

Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality.

Hossain, Md Ali; Saiful Islam, Sheikh Muhammad; Quinn, Julian M W; Huq, Fazlul; Moni, Mohammad Ali.

J Biomed Inform ; 100: 103313, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-31655274

RESUMEN

Ovarian cancer (OC) is a common cause of cancer death among women worldwide, so there is a pressing need to identify factors influencing OC mortality. Much OC patient clinical data is publicly accessible via the Broad Institute Cancer Genome Atlas (TCGA) datasets which include patient age, cancer site, stage and subtype and patient survival, as well as OC gene transcription profiles. These allow studies correlating OC patient survival (and other clinical variables) with gene expression to identify new OC biomarkers to predict patient mortality. We integrated clinical and tissue transcriptome data from patients available from the TCGA portal. We determined OC mRNA expression levels (compared to normal ovarian tissue) of 41 genes already implicated in OC progression, and assessed how their OC tissue expression levels predicts patient survival. We employed Cox Proportional Hazard regression models to analyse clinical factors and transcriptomic information to determine the relative effects on survival that is associated with each factor. Multivariate analysis of combined data (clinical and gene mRNA expression) found age and ovary tumour site significantly correlated with patient survival. The univariate analysis also confirmed significant differences in patient survival time when altered transcription levels of TLR4, BSCL2, CDH1, ERBB2, and SCGB2A1 were evident, while multivariate analysis that considered the 41 genes simultaneously revealed a significant relationship of survival with TLR4, BSCL2, CDH1, ERBB2 and PTPRE genes. However, analyses that considered all 41 genes with clinical variables together identified genes TLR4, BSCL2, CDH1, ERBB2, BRCA2 and SCGB2A1 as independently related to survival in OC. These studies indicate that the latter genes influence OC patient survival, i.e., expression levels of these genes provide mechanistic and predictive information in addition to that of the clinical traits. Our study provides strong evidence that these genes are important prognostic indicators of patient survival that give clues to biological processes that underlie OC progression and mortality.

Asunto(s)

Biología Computacional , Simulación por Computador , Regulación Neoplásica de la Expresión Génica , Aprendizaje Automático , Neoplasias Ováricas/genética , Neoplasias Ováricas/mortalidad , Conjuntos de Datos como Asunto , Progresión de la Enfermedad , Femenino , Humanos , Neoplasias Ováricas/patología , Análisis de Supervivencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA