Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
Neural Netw ; 180: 106572, 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39173200

RESUMEN

Person Re-identification (Re-ID) aims to match person images across non-overlapping cameras. The existing approaches formulate this task as fine-grained representation learning with deep neural networks, which involves extracting image features using a deep convolutional network, followed by mapping the features into a discriminative space through another smaller network, in order to make full use of all possible cues. However, recent Re-ID methods that strive to capture every cue and make the space more discriminative have resulted in longer features, ranging from 1024 to 14336, leading to higher time (distance computation) and space (feature storage) complexities. There are two potential solutions: reduction-after-training methods (such as Principal Component Analysis and Linear Discriminant Analysis) and reduction-during-training methods (such as 1 × 1 Convolution). The former utilizes a statistical approach aiming for a global optimum but lacking end-to-end optimization of large data and deep neural networks. The latter lacks theoretical guarantees and may be vulnerable to training noise such as dataset noise or initialization seed. To address these limitations, we propose a method called Euclidean-Distance-Preserving Feature Reduction (EDPFR) that combines the strengths of both reduction-after-training and reduction-during-training methods. EDPFR first formulates the feature reduction process as a matrix decomposition and derives a condition to preserve the Euclidean distance between features, thus ensuring accuracy in theory. Furthermore, the method integrates the matrix decomposition process into a deep neural network to enable end-to-end optimization and batch training, while maintaining the theoretical guarantee. The result of the EDPFR is a reduction of the feature dimensions from fa and fb to fa' and fb', while preserving their Euclidean distance, i.e.L2(fa,fb)=L2(fa',fb'). In addition to its Euclidean-Distance-Preserving capability, EDPFR also features a novel feature-level distillation loss. One of the main challenges in knowledge distillation is dimension mismatch. While previous distillation losses, usually project the mismatched features to matched class-level, spatial-level, or similarity-level spaces, this can result in a loss of information and decrease the flexibility and efficiency of distillation. Our proposed feature-level distillation leverages the benefits of the Euclidean-Distance-Preserving property and performs distillation directly in the feature space, resulting in a more flexible and efficient approach. Extensive on three Re-ID datasets, Market-1501, DukeMTMC-reID and MSMT demonstrate the effectiveness of our proposed Euclidean-Distance-Preserving Feature Reduction.

2.
Heliyon ; 10(15): e34911, 2024 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-39144958

RESUMEN

Background: Patients' hand-drawn Archimedes spirals are widely used in the neurological community to grade tremors. These spirals are either drawn on paper and Xeroxed/scanned into digital images or digitizing tablets are used for the drawings. This process introduces artifacts such as variable widths of the drawn lines with varying pixel grey scale values. Xeroxing introduces additional artifacts resulting from paper misalignments. These artifacts and the presence of the reference spiral in the image complicate an automatic extraction of a mathematical spiral signal from the image. New methods: We introduce a mathematical mapping that transforms the image pixels of the patient's hand-drawn spiral into a one-dimensional discrete signal that can be used for mathematical analysis. Results: A cohort of 18 hand-drawn spirals with various artifacts is used to validate our method.We extract the parameters of the discrete signals and show that the signals can be represented by truncating to as few as 150 parameters with a truncation RMS error of 6.26 % across the cohort. Using only 150 features makes machine learning a viable option for future applications. Furthermore, our method can be used to evaluate the frequency and the amplitude of the tremor. Comparison with existing methods: In existing methods, the patient draws the spiral on a digitizing tablet, and features are extracted from this data for machine learning. We recognize that a vast majority of hospitals are still using the pencil-and-paper approach, and there is an abundance of ready-to-be-mined tremor-related data already stored as paper or digitized drawings. Our procedure is equally applicable to Xeroxed documents as well as files generated from digital tablets. Conclusions: We have validated a new procedure requiring minimal user intervention to automatically extract a patient's hand-drawn spiral as a discrete mathematical one-dimensional signal from a scanned image or a file from a digital tablet.

3.
J Food Sci ; 89(7): 4403-4418, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38957090

RESUMEN

The improper storage of seeds can potentially compromise agricultural productivity, leading to reduced crop yields. Therefore, assessing seed viability before sowing is of paramount importance. Although numerous techniques exist for evaluating seed conditions, this research leveraged hyperspectral imaging (HSI) technology as an innovative, rapid, clean, and precise nondestructive testing method. The study aimed to determine the most effective classification model for watermelon seeds. Initially, purchased watermelon seeds were segregated into two groups: One underwent sterilization in a dehydrator machine at 40°C for 36 h, whereas the other batch was stored under favorable conditions. Watermelon seeds' spectral images were captured using an HSI with a charge-coupled device camera ranging from 400 to 1000 nm, and the segmented regions of all samples were measured. Preprocessing techniques and wavelength selection methods were applied to manage spectral data workload, followed by the implementation of a support vector machine (SVM) model. The initial hybrid-SVM model achieved a predictive accuracy rate of 100%, with a test set accuracy of 92.33%. Subsequently, an artificial bee colony (ABC) optimization was introduced to enhance model precision. The results indicated that, with kernel parameters (c, g) set at 13.17 and 0.01, respectively, and a runtime of 4.19328 s, the training and evaluation of the dataset achieved an accuracy rate of 100%. Hence, it was practical to utilize HSI technology combined with the PCA-ABC-SVM model to detect different watermelon seeds. As a result, these findings introduce a novel technique for accurately forecasting seed viability, intended for use in agricultural industrial multispectral imaging. PRACTICAL APPLICATION: The traditional methods for determining the condition of seeds primarily emphasize aesthetics, rely on subjective assessment, are time-consuming, and require a lot of labor. On the other hand, HSI technology as green technology was employed to alleviate the aforementioned problems. This work significantly contributes to the field of industrial multispectral imaging by enhancing the capacity to discern various types of seeds and agricultural crop products.


Asunto(s)
Citrullus , Imágenes Hiperespectrales , Aprendizaje Automático , Semillas , Espectroscopía Infrarroja Corta , Citrullus/química , Semillas/química , Imágenes Hiperespectrales/métodos , Espectroscopía Infrarroja Corta/métodos , Máquina de Vectores de Soporte , Algoritmos
4.
J Neurosci Methods ; 409: 110183, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38834145

RESUMEN

BACKGROUND: The significance of diagnosing illnesses associated with brain cognitive and gait freezing phase patterns has led to a recent surge in interest in the study of gait for mental disorders. A more precise and effective way to characterize and classify many common gait problems, such as foot and brain pulse disorders, can improve prognosis evaluation and treatment options for Parkinson patients. Nonetheless, the primary clinical technique for assessing gait abnormalities at the moment is visual inspection, which depends on the subjectivity of the observer and can be inaccurate. RESEARCH QUESTION: This study investigates whether it is possible to differentiate between gait brain disorder and the typical walking pattern using machine learning driven supervised learning techniques and data obtained from inertial measurement unit sensors for brain, hip and leg rehabilitation. METHOD: The proposed method makes use of the Daphnet freezing of Gait Data Set, consisted of 237 instances with 9 attributes. The method utilizes machine learning and feature reduction approaches in leg and hip gait recognition. RESULTS: From the obtained results, it is concluded that among all classifiers RF achieved highest accuracy as 98.9 % and Perceptron achieved lowest i.e. 70.4 % accuracy. While utilizing LDA as feature reduction approach, KNN, RF and NB also achieved promising accuracy and F1-score in comparison with SVM and LR classifiers. SIGNIFICANCE: In order to distinguish between the different gait disorders associated with brain tissues freezing/non-freezing and normal walking gait patterns, it is shown that the integration of different machine learning algorithms offers a viable and prospective solution. This research implies the need for an impartial approach to support clinical judgment.


Asunto(s)
Trastornos Neurológicos de la Marcha , Aprendizaje Automático , Humanos , Trastornos Neurológicos de la Marcha/diagnóstico , Trastornos Neurológicos de la Marcha/fisiopatología , Trastornos Neurológicos de la Marcha/etiología , Masculino , Femenino , Aprendizaje Automático Supervisado , Persona de Mediana Edad , Algoritmos , Análisis de la Marcha/métodos , Anciano , Adulto , Marcha/fisiología
5.
Front Hum Neurosci ; 18: 1362135, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38505099

RESUMEN

Introduction: Brain-computer interfaces (BCIs) are systems that acquire the brain's electrical activity and provide control of external devices. Since electroencephalography (EEG) is the simplest non-invasive method to capture the brain's electrical activity, EEG-based BCIs are very popular designs. Aside from classifying the extremity movements, recent BCI studies have focused on the accurate coding of the finger movements on the same hand through their classification by employing machine learning techniques. State-of-the-art studies were interested in coding five finger movements by neglecting the brain's idle case (i.e., the state that brain is not performing any mental tasks). This may easily cause more false positives and degrade the classification performances dramatically, thus, the performance of BCIs. This study aims to propose a more realistic system to decode the movements of five fingers and the no mental task (NoMT) case from EEG signals. Methods: In this study, a novel praxis for feature extraction is utilized. Using Proper Rotational Components (PRCs) computed through Intrinsic Time Scale Decomposition (ITD), which has been successfully applied in different biomedical signals recently, features for classification are extracted. Subsequently, these features were applied to the inputs of well-known classifiers and their different implementations to discriminate between these six classes. The highest classifier performances obtained in both subject-independent and subject-dependent cases were reported. In addition, the ANOVA-based feature selection was examined to determine whether statistically significant features have an impact on the classifier performances or not. Results: As a result, the Ensemble Learning classifier achieved the highest accuracy of 55.0% among the tested classifiers, and ANOVA-based feature selection increases the performance of classifiers on five-finger movement determination in EEG-based BCI systems. Discussion: When compared with similar studies, proposed praxis achieved a modest yet significant improvement in classification performance although the number of classes was incremented by one (i.e., NoMT).

6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38493341

RESUMEN

Kinase fusion genes are the most active fusion gene group in human cancer fusion genes. To help choose the clinically significant kinase so that the cancer patients that have fusion genes can be better diagnosed, we need a metric to infer the assessment of kinases in pan-cancer fusion genes rather than relying on the sample frequency expressed fusion genes. Most of all, multiple studies assessed human kinases as the drug targets using multiple types of genomic and clinical information, but none used the kinase fusion genes in their study. The assessment studies of kinase without kinase fusion gene events can miss the effect of one of the mechanisms that enhance the kinase function in cancer. To fill this gap, in this study, we suggest a novel way of assessing genes using a network propagation approach to infer how likely individual kinases influence the kinase fusion gene network composed of ~5K kinase fusion gene pairs. To select a better seed of propagation, we chose the top genes via dimensionality reduction like a principal component or latent layer information of six features of individual genes in pan-cancer fusion genes. Our approach may provide a novel way to assess of human kinases in cancer.


Asunto(s)
Redes Reguladoras de Genes , Neoplasias , Humanos , Neoplasias/genética , Fusión Génica
7.
Neuropsychiatr Dis Treat ; 20: 137-148, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38282834

RESUMEN

Purpose: While previous studies have suggested close association of psychological variables of students withtheir higher-order cognitive abilities, such studies have largely been lacking for third world countries like India, with their unique socio-economic-cultural set of challenges. We aimed to investigate the relationship between psychological variables (depression, anxiety and stress) and cognitive functions among Indian students, and to predict cognitive performance as a function of these variables. Patients and Methods: Four hundred and thirteen university students were systematically selected using purposive sampling. Widely used and validated offline questionnaires were used to assess their psychological and cognitive statuses. Correlational analyses were conducted to examine the associations between these variables. An Artificial Neural Network (ANN) model was applied to predict cognitive levels based on the scores of psychological variables. Results: Correlational analyses revealed negative correlations between emotional distress and cognitive functioning. Principal Component Analysis (PCA) reduced the dimensionality of the input data, effectively capturing the variance with fewer features. The feature weight analysis indicated a balanced contribution of each mental health symptom, with particular emphasis on one of the symptoms. The ANN model demonstrated moderate predictive performance, explaining a portion of the variance in cognitive levels based on the psychological variables. Conclusion: The study confirms significant associations between emotional statuses of university students with their cognitive abilities. Specifically, we provide evidence for the first time that in Indian students, self-reported higher levels of stress, anxiety, and depression are linked to lower performance in cognitive tests. The application of PCA and feature weight analysis provided deeper insights into the structure of the predictive model. Notably, use of the ANN model provided insights into predicting these cognitive domains as a function of the emotional attributes. Our results emphasize the importance of addressing mental health concerns and implementing interventions for the enhancement of cognitive functions in university students.

8.
Brain Inform ; 11(1): 3, 2024 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-38219249

RESUMEN

In the field of audiology, achieving accurate discrimination of auditory impairments remains a formidable challenge. Conditions such as deafness and tinnitus exert a substantial impact on patients' overall quality of life, emphasizing the urgent need for precise and efficient classification methods. This study introduces an innovative approach, utilizing Multi-View Brain Network data acquired from three distinct cohorts: 51 deaf patients, 54 with tinnitus, and 42 normal controls. Electroencephalogram (EEG) recording data were meticulously collected, focusing on 70 electrodes attached to an end-to-end key with 10 regions of interest (ROI). This data is synergistically integrated with machine learning algorithms. To tackle the inherently high-dimensional nature of brain connectivity data, principal component analysis (PCA) is employed for feature reduction, enhancing interpretability. The proposed approach undergoes evaluation using ensemble learning techniques, including Random Forest, Extra Trees, Gradient Boosting, and CatBoost. The performance of the proposed models is scrutinized across a comprehensive set of metrics, encompassing cross-validation accuracy (CVA), precision, recall, F1-score, Kappa, and Matthews correlation coefficient (MCC). The proposed models demonstrate statistical significance and effectively diagnose auditory disorders, contributing to early detection and personalized treatment, thereby enhancing patient outcomes and quality of life. Notably, they exhibit reliability and robustness, characterized by high Kappa and MCC values. This research represents a significant advancement in the intersection of audiology, neuroimaging, and machine learning, with transformative implications for clinical practice and care.

9.
Heliyon ; 10(1): e23571, 2024 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-38187288

RESUMEN

Feature selection is a critical component of machine learning and data mining which addresses challenges like irrelevance, noise, redundancy in large-scale data etc., which often result in the curse of dimensionality. This study employs a K-nearest neighbour wrapper to implement feature selection using six nature-inspired algorithms, derived from human behaviour and mammal-inspired techniques. Evaluated on six real-world datasets, the study aims to compare the performance of these algorithms in terms of accuracy, feature count, fitness, convergence and computational cost. The findings underscore the efficacy of the Human Learning Optimization, Poor and Rich Optimization and Grey Wolf Optimizer algorithms across multiple performance metrics. For instance, for mean fitness, Human Learning Optimization outperforms the others, followed by Poor and Rich Optimization and Harmony Search. The study suggests the potential of human-inspired algorithms, particularly Poor and Rich Optimization, in robust feature selection without compromising classification accuracy.

10.
J Digit Imaging ; 36(6): 2602-2612, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37532925

RESUMEN

Breast cancer is the second most common cancer among women worldwide, and the diagnosis by pathologists is a time-consuming procedure and subjective. Computer-aided diagnosis frameworks are utilized to relieve pathologist workload by classifying the data automatically, in which deep convolutional neural networks (CNNs) are effective solutions. The features extracted from the activation layer of pre-trained CNNs are called deep convolutional activation features (DeCAF). In this paper, we have analyzed that all DeCAF features are not necessarily led to higher accuracy in the classification task and dimension reduction plays an important role. We have proposed reduced DeCAF (R-DeCAF) for this purpose, and different dimension reduction methods are applied to achieve an effective combination of features by capturing the essence of DeCAF features. This framework uses pre-trained CNNs such as AlexNet, VGG-16, and VGG-19 as feature extractors in transfer learning mode. The DeCAF features are extracted from the first fully connected layer of the mentioned CNNs, and a support vector machine is used for classification. Among linear and nonlinear dimensionality reduction algorithms, linear approaches such as principal component analysis (PCA) represent a better combination among deep features and lead to higher accuracy in the classification task using a small number of features considering a specific amount of cumulative explained variance (CEV) of features. The proposed method is validated using experimental BreakHis and ICIAR datasets. Comprehensive results show improvement in the classification accuracy up to 4.3% with a feature vector size (FVS) of 23 and CEV equal to 0.15.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/patología , Redes Neurales de la Computación , Algoritmos , Diagnóstico por Computador , Máquina de Vectores de Soporte
11.
J Med Signals Sens ; 13(2): 165-172, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37448546

RESUMEN

It has been a long time since we use magnetic resonance imaging (MRI) to detect brain diseases and many useful techniques have been developed for this task. However, there is still a potential for further improvement of classification of brain diseases in order to be sure of the results. In this research we presented, for the first time, a non-linear feature extraction method from the MRI sub-images that are obtained from the three levels of the two-dimensional Dual tree complex wavelet transform (2D DT-CWT) in order to classify multiple brain disease. After extracting the non-linear features from the sub-images, we used the spectral regression discriminant analysis (SRDA) algorithm to reduce the classifying features. Instead of using the deep neural networks that are computationally expensive, we proposed the Hybrid RBF network that uses the k-means and recursive least squares (RLS) algorithm simultaneously in its structure for classification. To evaluate the performance of RBF networks with hybrid learning algorithms, we classify nine brain diseases based on MRI processing using these networks, and compare the results with the previously presented classifiers including, supporting vector machines (SVM) and K-nearest neighbour (KNN). Comprehensive comparisons are made with the recently proposed cases by extracting various types and numbers of features. Our aim in this paper is to reduce the complexity and improve the classifying results with the hybrid RBF classifier and the results showed 100 percent classification accuracy in both the two class and the multiple classification of brain diseases in 8 and 10 classes. In this paper, we provided a low computational and precise method for brain MRI disease classification. the results show that the proposed method is not only accurate but also computationally reasonable.

12.
Comput Biol Med ; 163: 107208, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37421737

RESUMEN

Accurate segmentation of liver tumors is a prerequisite for early diagnosis of liver cancer. Segmentation networks extract features continuously at the same scale, which cannot adapt to the variation of liver tumor volume in computed tomography (CT). Hence, a multi-scale feature attention network (MS-FANet) for liver tumor segmentation is proposed in this paper. The novel residual attention (RA) block and multi-scale atrous downsampling (MAD) are introduced in the encoder of MS-FANet to sufficiently learn variable tumor features and extract tumor features at different scales simultaneously. The dual-path feature (DF) filter and dense upsampling (DU) are introduced in the feature reduction process to reduce effective features for the accurate segmentation of liver tumors. On the public LiTS dataset and 3DIRCADb dataset, MS-FANet achieved 74.2% and 78.0% of average Dice, respectively, outperforming most state-of-the-art networks, this strongly proves the excellent liver tumor segmentation performance and the ability to learn features at different scales.


Asunto(s)
Neoplasias Hepáticas , Humanos , Neoplasias Hepáticas/diagnóstico por imagen , Aprendizaje , Tomografía Computarizada por Rayos X , Procesamiento de Imagen Asistido por Computador
13.
PeerJ Comput Sci ; 9: e1240, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37346554

RESUMEN

Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets "viagra", "ciallis", "levitra" and other representing similar drugs by using "virility drug" which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.

14.
BMC Bioinformatics ; 24(1): 178, 2023 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-37127563

RESUMEN

BACKGROUND: The field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine-guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons. RESULTS: We found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general-suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities. CONCLUSIONS: The variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.


Asunto(s)
Metilación de ADN , Epigenómica , Homeostasis del Telómero , Algoritmos , Epigenómica/métodos , Análisis de Regresión , Aprendizaje Automático , Humanos
15.
Phys Med Biol ; 68(12)2023 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-37201539

RESUMEN

Aiming at accurate survival prediction of Glioblastoma (GBM) patients following radiation therapy, we developed a subregion-based survival prediction framework via a novel feature construction method on multi-sequence MRIs. The proposed method consists of two main steps: (1) a feature space optimization algorithm to determine the most appropriate matching relation derived between multi-sequence MRIs and tumor subregions, for using multimodal image data more reasonable; (2) a clustering-based feature bundling and construction algorithm to compress the high-dimensional extracted radiomic features and construct a smaller but effective set of features, for accurate prediction model construction. For each tumor subregion, a total of 680 radiomic features were extracted from one MRI sequence using Pyradiomics. Additional 71 geometric features and clinical information were collected resulting in an extreme high-dimensional feature space of 8231 to train and evaluate the survival prediction at 1 year, and the more challenging overall survival prediction. The framework was developed based on 98 GBM patients from the BraTS 2020 dataset under five-fold cross-validation, and tested on an external cohort of 19 GBM patients randomly selected from the same dataset. Finally, we identified the best matching relationship between each subregion and its corresponding MRI sequence, a subset of 235 features (out of 8231 features) were generated by the proposed feature bundling and construction framework. The subregion-based survival prediction framework achieved AUCs of 0.998 and 0.983 on the training and independent test cohort respectively for 1 year survival prediction, compared to AUCs of 0.940 and 0.923 for survival prediction using the 8231 initial extracted features for training and validation cohorts respectively. Finally, we further constructed an effective stacking structure ensemble regressor to predict the overall survival with the C-index of 0.872. The proposed subregion-based survival prediction framework allow us to better stratified patients towards personalized treatment of GBM.


Asunto(s)
Neoplasias Encefálicas , Glioblastoma , Humanos , Glioblastoma/diagnóstico por imagen , Glioblastoma/patología , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/patología , Imagen por Resonancia Magnética/métodos , Algoritmos , Área Bajo la Curva
16.
Genes (Basel) ; 14(3)2023 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-36980878

RESUMEN

DNA synthesis is widely used in synthetic biology to construct and assemble sequences ranging from short RBS to ultra-long synthetic genomes. Many sequence features, such as the GC content and repeat sequences, are known to affect the synthesis difficulty and subsequently the synthesis cost. In addition, there are latent sequence features, especially local characteristics of the sequence, which might affect the DNA synthesis process as well. Reliable prediction of the synthesis difficulty for a given sequence is important for reducing the cost, but this remains a challenge. In this study, we propose a new automated machine learning (AutoML) approach to predict the DNA synthesis difficulty, which achieves an F1 score of 0.930 and outperforms the current state-of-the-art model. We found local sequence features that were neglected in previous methods, which might also affect the difficulty of DNA synthesis. Moreover, experimental validation based on ten genes of Escherichia coli strain MG1655 shows that our model can achieve an 80% accuracy, which is also better than the state of art. Moreover, we developed the cloud platform SCP4SSD using an entirely cloud-based serverless architecture for the convenience of the end users.


Asunto(s)
Escherichia coli , Aprendizaje Automático , Secuencia de Bases , Escherichia coli/genética , Composición de Base , ADN/genética
17.
Front Digit Health ; 5: 1064936, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36778102

RESUMEN

Disease phenotypes are characterized by signs (what a physician observes during the examination of a patient) and symptoms (the complaints of a patient to a physician). Large repositories of disease phenotypes are accessible through the Online Mendelian Inheritance of Man, Human Phenotype Ontology, and Orphadata initiatives. Many of the diseases in these datasets are neurologic. For each repository, the phenotype of neurologic disease is represented as a list of concepts of variable length where the concepts are selected from a restricted ontology. Visualizations of these concept lists are not provided. We address this limitation by using subsumption to reduce the number of descriptive features from 2,946 classes into thirty superclasses. Phenotype feature lists of variable lengths were converted into fixed-length vectors. Phenotype vectors were aggregated into matrices and visualized as heat maps that allowed side-by-side disease comparisons. Individual diseases (representing a row in the matrix) were visualized as word clouds. We illustrate the utility of this approach by visualizing the neuro-phenotypes of 32 dystonic diseases from Orphadata. Subsumption can collapse phenotype features into superclasses, phenotype lists can be vectorized, and phenotypes vectors can be visualized as heat maps and word clouds.

18.
J Hazard Mater ; 448: 130906, 2023 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-36764252

RESUMEN

A novel analytical workflow for suspect screening of organic acidic contaminants in drinking water is presented, featuring selective extraction by silica-based strong anion-exchange solid-phase extraction, mixed-mode liquid chromatography-high resolution accurate mass spectrometry (LC-HRMS), peak detection, feature reduction and compound identification. The novel use of an ammonium bicarbonate-based elution solvent extended strong anion-exchange solid-phase extraction applicability to LC-HRMS of strong acids. This approach performed with consistently higher recovery and repeatability (88 ± 7 % at 500 ng L-1), improved selectivity and lower matrix interference (mean = 12 %) over a generic mixed-mode weak anion exchange SPE method. In addition, a novel filter for reducing full-scan features from fulvic and humic acids was successfully introduced, reducing workload and potential for false positives. The workflow was then applied to 10 London municipal drinking water samples, revealing the presence of 22 confirmed and 37 tentatively identified substances. Several poorly investigated and potentially harmful compounds were found which included halogenated hydroxy-cyclopentene-diones and dibromomethanesulfonic acid. Some of these compounds have been reported as mutagenic in test systems and thus their presence here requires further investigation. Overall, this approach demonstrated that employing selective extraction improved detection and helped shortlist suspects and potentially toxic chemical contaminants with higher confidence.


Asunto(s)
Agua Potable , Contaminantes Químicos del Agua , Agua Potable/química , Espectrometría de Masas en Tándem/métodos , Cromatografía Liquida/métodos , Extracción en Fase Sólida/métodos , Contaminantes Químicos del Agua/química
19.
Foods ; 12(1)2023 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-36613425

RESUMEN

Spectroscopy data are useful for modelling biological systems such as predicting quality parameters of horticultural products. However, using the wide spectrum of wavelengths is not practical in a production setting. Such data are of high dimensional nature and they tend to result in complex models that are not easily understood. Furthermore, collinearity between different wavelengths dictates that some of the data variables are redundant and may even contribute noise. The use of variable selection methods is one efficient way to obtain an optimal model, andthis was the aim of this work. Taking advantage of a non-contact spectrometer, near infrared spectral data in the range of 800-2500 nm were used to classify bruise damage in three apple cultivars, namely 'Golden Delicious', 'Granny Smith' and 'Royal Gala'. Six prominent machine learning classification algorithms were employed, and two variable selection methods were used to determine the most relevant wavelengths for the problem of distinguishing between bruised and non-bruised fruit. The selected wavelengths clustered around 900 nm, 1300 nm, 1500 nm and 1900 nm. The best results were achieved using linear regression and support vector machine based on up to 40 wavelengths: these methods reached precision values in the range of 0.79-0.86, which were all comparable (within error bars) to a classifier based on the entire range of frequencies. The results also provided an open-source based framework that is useful towards the development of multi-spectral applications such as rapid grading of apples based on mechanical damage, and it can also be emulated and applied for other types of defects on fresh produce.

20.
Sensors (Basel) ; 23(2)2023 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-36679453

RESUMEN

A hyperspectral image (HSI), which contains a number of contiguous and narrow spectral wavelength bands, is a valuable source of data for ground cover examinations. Classification using the entire original HSI suffers from the "curse of dimensionality" problem because (i) the image bands are highly correlated both spectrally and spatially, (ii) not every band can carry equal information, (iii) there is a lack of enough training samples for some classes, and (iv) the overall computational cost is high. Therefore, effective feature (band) reduction is necessary through feature extraction (FE) and/or feature selection (FS) for improving the classification in a cost-effective manner. Principal component analysis (PCA) is a frequently adopted unsupervised FE method in HSI classification. Nevertheless, its performance worsens when the dataset is noisy, and the computational cost becomes high. Consequently, this study first proposed an efficient FE approach using a normalized mutual information (NMI)-based band grouping strategy, where the classical PCA was applied to each band subgroup for intrinsic FE. Finally, the subspace of the most effective features was generated by the NMI-based minimum redundancy and maximum relevance (mRMR) FS criteria. The subspace of features was then classified using the kernel support vector machine. Two real HSIs collected by the AVIRIS and HYDICE sensors were used in an experiment. The experimental results demonstrated that the proposed feature reduction approach significantly improved the classification performance. It achieved the highest overall classification accuracy of 94.93% for the AVIRIS dataset and 99.026% for the HYDICE dataset. Moreover, the proposed approach reduced the computational cost compared with the studied methods.


Asunto(s)
Máquina de Vectores de Soporte , Análisis de Componente Principal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA