Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 407
Filtrar
1.
J Cell Mol Med ; 28(17): e70046, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39228010

RESUMEN

PIWI-interacting RNAs (piRNAs) are a typical class of small non-coding RNAs, which are essential for gene regulation, genome stability and so on. Accumulating studies have revealed that piRNAs have significant potential as biomarkers and therapeutic targets for a variety of diseases. However current computational methods face the challenge in effectively capturing piRNA-disease associations (PDAs) from limited data. In this study, we propose a novel method, MRDPDA, for predicting PDAs based on limited data from multiple sources. Specifically, MRDPDA integrates a deep factorization machine (deepFM) model with regularizations derived from multiple yet limited datasets, utilizing separate Laplacians instead of a simple average similarity network. Moreover, a unified objective function to combine embedding loss about similarities is proposed to ensure that the embedding is suitable for the prediction task. In addition, a balanced benchmark dataset based on piRPheno is constructed and a deep autoencoder is applied for creating reliable negative set from the unlabeled dataset. Compared with three latest methods, MRDPDA achieves the best performance on the pirpheno dataset in terms of the five-fold cross validation test and independent test set, and case studies further demonstrate the effectiveness of MRDPDA.


Asunto(s)
Biología Computacional , ARN Interferente Pequeño , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo , Humanos , Biología Computacional/métodos , Algoritmos , Predisposición Genética a la Enfermedad , Aprendizaje Profundo , ARN de Interacción con Piwi
2.
Physiol Meas ; 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39231477

RESUMEN

OBJECTIVE: Accurate prediction of unmearsured muscle excitations can reduce the required wearable surface electromyography (sEMG) sensors, which is a critical factor in the study of physiological measurement. Synergy extrapolation uses synergy excitations as building blocks to reconstruct muscle excitations. However, the practical application of synergy extrapolation is still limited as the extrapolation process utilizes unmeasured muscle excitations it seeks to reconstruct. This paper aims to propose and derive methods to provide an avenue for the practical application of synergy extrapolation with non-negative matrix factorization (NMF) methods. APPROACH: Specifically, a tunable Gaussian-Laplacian mixture distribution NMF (GLD-NMF) method and related multiplicative update rules are derived to yield appropriate synergy excitations for extrapolation. Furthermore, a template-based extrapolation structure (TBES) is proposed to extrapolate unmeasured muscle excitations based on synergy weighting matrix templates totally extracted from measured sEMG datasets, improving the extrapolation performance. Moreover, we applied the proposed GLD-NMF method and TBES to selected muscle excitations acquired from a series of single-leg stance (SLS) tests, walking tests and upper limb reaching tests. MAIN RESULTS: Experimental results show that the proposed GLD-NMF and TBES could extrapolate unmeasured muscle excitations accurately. Moreover, introducing synergy weighting matrix templates could decrease the number of sEMG sensors in a series of experiments. In addition, verification results demonstrate the feasibility of applying synergy extrapolation with NMF methods. SIGNIFICANCE: With the TBES method, synergy extrapolation could play a significant role in reducing data dimensions of sEMG sensors, which will improve the portability of sEMG sensors-based systems and promotes applications of sEMG signals in human-machine interfaces scenarios.

3.
Sci Prog ; 107(3): 368504241275417, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39275848

RESUMEN

An intuitionistic fuzzy rough model is a powerful tool for dealing with complex uncertainty and imprecision in graph-based models, combining the strengths of intuitionistic fuzzy sets and rough sets. In this research, a correlation coefficient is an established tool for finding the strength of the relationship between two intuitionistic fuzzy rough graphs since correlation coefficients are very capable of processing and interpreting data. Furthermore, an intuitionistic fuzzy rough environment is integrated with attribute decision-making based on correlation coefficients. In order to measure the correlation between two intuitionistic fuzzy rough graphs, this suggests utilising the correlation coefficient concept and weighted correlation coefficient. In order to identify decision-making issues that are supported by intuitionistic fuzzy rough preference relations, the Laplacian energy and new correlation coefficient of intuitionistic fuzzy rough graphs are calculated in this study. We propose a new approach to computing the relative position loads of establishments by adjusting the correlation coefficient between one personality's intuitionistic fuzzy rough preference relation and the other items, as well as the undecided corroboration of the intuitionistic fuzzy rough preference relation. This paper determines the ranking order of all alternatives and the best one by using the correlation coefficient between each option and the ideal choice. In the meantime, the appropriate example improves decision-making for robotic vacuum cleaners by effectively handling uncertain and imprecise data, thereby optimising cleaning performance.

4.
Sci Rep ; 14(1): 18000, 2024 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-39097655

RESUMEN

Group decision-making (GDM) is crucial in various components of graph theory, management science, and operations research. In particular, in an intuitionistic fuzzy group decision-making problem, the experts communicate their preferences using intuitionistic fuzzy preference relations (IFPRs). This approach is a way that decision-makers rank or select the most desirable alternatives by gathering criteria-based information to estimate the best alternatives using a wider range of knowledge and experience. This article proposes a new statistical measure in a fuzzy environment when the data is ambiguous or unreliable to solve a decision-making problem. This study uses the variation coefficient measure combined with intuitionistic fuzzy graphs (IFG) and Laplacian energy (LE) to solve a GDM problem that utilizes intuitionistic fuzzy preference relations (IFPRs) to select a reliable alliance partner. Initially, the Laplacian energy determines the weight of individual standards, and the obtained weight average further estimates the overall criterion weight vector. We establish the authority criteria weights using the variation coefficient measure and then ultimately rank the alternatives for each criterion using the same measure. We examine four distinct companies Alpha, Beta, Delta, and Zeta to conduct a realistic GDM to choose which alliance partner would be ideal. We successfully implemented the suggested technique, determining that Alpha satisfies company standards and is ranked first among other companies. Moreover, this technique is useful for all kinds of Intuitionistic fuzzy group decision-making problems to select optimal ones.

5.
Sci Rep ; 14(1): 19650, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39179791

RESUMEN

In real-life complex traffic environments, vehicles are often occluded by extraneous background objects and other vehicles, leading to severe degradation of object detector performance. To address this issue, we propose a method named YOLO-OVD (YOLO for occluded vehicle detection) and a dataset for effectively handling vehicle occlusion in various scenarios. To highlight the model attention in unobstructed region of vehicles, we design a novel grouped orthogonal attention (GOA) module to achieve maximum information extraction between channels. We utilize grouping and channel shuffling to address the initialization and computational issues of original orthogonal filters, followed by spatial attention for enhancing spatial features in vehicle-visible regions. We introduce a CIoU-based repulsion term into the loss function to augment the network's localization accuracy in scenarios involving densely packed vehicles. Moreover, we explore the effect of the knowledge-based Laplacian Pyramid on the OVD performance, which contributes to fast convergence in training and ensures more detailed and comprehensive feature retention. We conduct extensive experiments on the established occluded vehicle detection dataset, which demonstrates that the proposed YOLO-OVD model significantly outperforms 14 representative object detectors. Notably, it achieves improvements of 4.7% in Precision, 3.6% in AP@0.5, and 1.9% in AP@0.5:0.95 compared to the YOLOv5 baseline.

6.
JMIR Med Inform ; 12: e52896, 2024 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-39087585

RESUMEN

Background: The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the "curse of dimensionality" and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems. Objective: The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients. Methods: We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis. Results: In feature space reconstruction and mortality prediction, the concrete autoencoder-based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (P<.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives. Conclusions: This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features.

7.
Heliyon ; 10(12): e32235, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-39183868

RESUMEN

Consider a simple undirected connected graph G, with D ( G ) and A ( G ) representing its degree and adjacency matrices, respectively. Furthermore, L ( G ) = D ( G ) - A ( G ) is the Laplacian matrix of G, and H t = exp ⁡ ( - t L ( G ) ) is the heat kernel (HK) of G, with t > 0 denoting the time variable. For a vertex u ∈ V ( G ) , the uth element of the diagonal of the HK is defined as H t ( u , u ) = ( exp ⁡ ( - t L ( G ) ) ) u u = ∑ k = 0 ∞ ( ( - t L ( G ) ) k ) u u k ! , and H E ( G ) = ∑ i = 1 n e - t λ i = ∑ u = 1 n H t ( u , u ) is the HK trace of G, where λ 1 , λ 2 , ⋯ , λ n denote the eigenvalues of L ( G ) . This study provides new computational formulas for the HK diagonal entries of graphs using an almost equitable partition and the Schur complement technique. We also provide bounds for the HK trace of the graphs.

8.
Animals (Basel) ; 14(11)2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38891736

RESUMEN

Understanding the feeding dynamics of aquatic animals is crucial for aquaculture optimization and ecosystem management. This paper proposes a novel framework for analyzing fish feeding behavior based on a fusion of spectrogram-extracted features and deep learning architecture. Raw audio waveforms are first transformed into Log Mel Spectrograms, and a fusion of features such as the Discrete Wavelet Transform, the Gabor filter, the Local Binary Pattern, and the Laplacian High Pass Filter, followed by a well-adapted deep model, is proposed to capture crucial spectral and spectral information that can help distinguish between the various forms of fish feeding behavior. The Involutional Neural Network (INN)-based deep learning model is used for classification, achieving an accuracy of up to 97% across various temporal segments. The proposed methodology is shown to be effective in accurately classifying the feeding intensities of Oplegnathus punctatus, enabling insights pertinent to aquaculture enhancement and ecosystem management. Future work may include additional feature extraction modalities and multi-modal data integration to further our understanding and contribute towards the sustainable management of marine resources.

9.
ArXiv ; 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38883239

RESUMEN

AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step towards comprehensive, automated protein structure prediction. However, independent validation of AF3's predictions is necessary. Evaluated using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations, AF3 complex structures give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3's complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3's complex structures are not reliable for intrinsically flexible regions or domains.

10.
Results Math ; 79(5): 187, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38895155

RESUMEN

We derive various eigenvalue estimates for the Hodge Laplacian acting on differential forms on weighted Riemannian manifolds. Our estimates unify and extend various results from the literature and provide a number of geometric applications. In particular, we derive an inequality which relates the eigenvalues of the Jacobi operator for f-minimal hypersurfaces and the spectrum of the Hodge Laplacian.

11.
Small ; : e2401630, 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38837314

RESUMEN

With the growing demand for nanodevices, there is a concerted effort to improve the design flexibility of nanostructures, thereby expanding the capabilities of nanophotonic devices. In this work, a Laplacian-weighted binary search (LBS) algorithm is proposed to generate a unidirectional transmission metasurface from a high-dimensional design space, offering an increased degree of design freedom. The LBS algorithm incorporates topological continuity based on the Laplacian, effectively circumventing the common issue of high structural complexity in designing high-dimensional nanostructures. As a result, metasurfaces developed using the LBS algorithm in a high-dimensional design space exhibit reduced complexity, which is advantageous for experimental fabrication. An all-dielectric metasurface with unidirectional transmission, designed from the high-dimensional space using the LBS method, demonstrated the successful application of these design principles in experiments. The metasurface exhibits high optical performance on unidirectional transmission in measurements by a high-resolution angle-resolved micro-spectra system, achieving forward transmissivity above 90% (400-700 nm) and back transmissivity below 20% (400-500 nm) within the targeted wavelength range. This work provides a feasible approach for advancing high-dimensional metasurface applications, as the LBS design method takes into account topological continuity during experimental processing. Compared to traditional direct binary search (DBS) methods, the LBS method not only improves information processing efficiency but also maintains the topological continuity of structures. Beyond unidirectional transmission, the LBS-based design method has generality and flexibility to accommodate almost all physical scenarios in metasurface design, enabling a multitude of complex functions and applications.

12.
J Imaging ; 10(5)2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38786575

RESUMEN

In graph theory, the weighted Laplacian matrix is the most utilized technique to interpret the local and global properties of a complex graph structure within computer vision applications. However, with increasing graph nodes, the Laplacian matrix's dimensionality also increases accordingly. Therefore, there is always the "curse of dimensionality"; In response to this challenge, this paper introduces a new approach to reducing the dimensionality of the weighted Laplacian matrix by utilizing the Gershgorin circle theorem by transforming the weighted Laplacian matrix into a strictly diagonal domain and then estimating rough eigenvalue inclusion of a matrix. The estimated inclusions are represented as reduced features, termed GC features; The proposed Gershgorin circle feature extraction (GCFE) method was evaluated using three publicly accessible computer vision datasets, varying image patch sizes, and three different graph types. The GCFE method was compared with eight distinct studies. The GCFE demonstrated a notable positive Z-score compared to other feature extraction methods such as I-PCA, kernel PCA, and spectral embedding. Specifically, it achieved an average Z-score of 6.953 with the 2D grid graph type and 4.473 with the pairwise graph type, particularly on the E_Balanced dataset. Furthermore, it was observed that while the accuracy of most major feature extraction methods declined with smaller image patch sizes, the GCFE maintained consistent accuracy across all tested image patch sizes. When the GCFE method was applied to the E_MNSIT dataset using the K-NN graph type, the GCFE method confirmed its consistent accuracy performance, evidenced by a low standard deviation (SD) of 0.305. This performance was notably lower compared to other methods like Isomap, which had an SD of 1.665, and LLE, which had an SD of 1.325; The GCFE outperformed most feature extraction methods in terms of classification accuracy and computational efficiency. The GCFE method also requires fewer training parameters for deep-learning models than the traditional weighted Laplacian method, establishing its potential for more effective and efficient feature extraction in computer vision tasks.

13.
Front Neuroinform ; 18: 1395916, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38817244

RESUMEN

Recently, graph theory has become a promising tool for biomedical signal analysis, wherein the signals are transformed into a graph network and represented as either adjacency or Laplacian matrices. However, as the size of the time series increases, the dimensions of transformed matrices also expand, leading to a significant rise in computational demand for analysis. Therefore, there is a critical need for efficient feature extraction methods demanding low computational time. This paper introduces a new feature extraction technique based on the Gershgorin Circle theorem applied to biomedical signals, termed Gershgorin Circle Feature Extraction (GCFE). The study makes use of two publicly available datasets: one including synthetic neural recordings, and the other consisting of EEG seizure data. In addition, the efficacy of GCFE is compared with two distinct visibility graphs and tested against seven other feature extraction methods. In the GCFE method, the features are extracted from a special modified weighted Laplacian matrix from the visibility graphs. This method was applied to classify three different types of neural spikes from one dataset, and to distinguish between seizure and non-seizure events in another. The application of GCFE resulted in superior performance when compared to seven other algorithms, achieving a positive average accuracy difference of 2.67% across all experimental datasets. This indicates that GCFE consistently outperformed the other methods in terms of accuracy. Furthermore, the GCFE method was more computationally-efficient than the other feature extraction techniques. The GCFE method can also be employed in real-time biomedical signal classification where the visibility graphs are utilized such as EKG signal classification.

14.
Comput Biol Med ; 175: 108497, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38678944

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.


Asunto(s)
Análisis de Componente Principal , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Algoritmos , RNA-Seq/métodos
15.
Discov Med ; 36(183): 730-738, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38665022

RESUMEN

BACKGROUND: Current research on radiomics for diagnosing and prognosing acute pancreatitis predominantly revolves around model development and testing. However, there is a notable absence of ongoing interpretation and analysis regarding the physical significance of these models and features. Additionally, there is a lack of extensive exploration of visual information within the images. This limitation hinders the broad applicability of radiomics findings. This study aims to address this gap by specifically analyzing filtered Computed Tomography (CT) image features of acute pancreatitis to identify meaningful visual markers in the pancreas and peripancreatic area. METHODS: Numerous filtered CT images were obtained through pyradiomics. The window width and window level were fine-tuned to emphasize the pancreas and peripancreatic regions. Subsequently, the LightGBM algorithm was employed to conduct an embedded feature screening, followed by statistical analysis to identify features with statistical significance (p-value < 0.01). Within the purview of the study, for each filtering method, features of high importance to the preceding prediction model were incorporated into the analysis. The image visual markers were then systematically sought in reverse, and their medical interpretation was undertaken to a certain extent. RESULTS: In Laplacian of Gaussian filtered images within the pancreatic region, severe acute pancreatitis (SAP) exhibited fewer small areas with repetitive greyscale patterns. Conversely, in the peripancreatic region, SAP displayed greater irregularity in both area size and the distribution of greyscale levels. In logarithmic images, SAP demonstrated reduced low greyscale connectivity in the pancreatic region, while showcasing a higher average variation in greyscale between two adjacent pixels in the peripancreatic region. Moreover, in gradient images, SAP presented with decreased repetition of two adjacent pixel greyscales within the pancreatic region, juxtaposed with an increased inhomogeneity in the size of the same greyscale region within the δ range in the peripancreatic region. CONCLUSIONS: Various filtered images convey distinct physical significance and properties. The selection of the appropriate filtered image, contingent upon the characteristics of the Region of Interest (ROI), enables a more comprehensive capture of the heterogeneity of the disease.


Asunto(s)
Algoritmos , Pancreatitis , Tomografía Computarizada por Rayos X , Humanos , Pancreatitis/diagnóstico por imagen , Pancreatitis/diagnóstico , Pancreatitis/patología , Tomografía Computarizada por Rayos X/métodos , Enfermedad Aguda , Masculino , Páncreas/diagnóstico por imagen , Páncreas/patología , Femenino , Persona de Mediana Edad , Radiómica
16.
Heliyon ; 10(5): e26436, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38449626

RESUMEN

Effectively utilizing information from multiple sources and fewer labeled operating condition samples from a sucker-rod pumping system for oil production can improve the recognition effects and engineering practicability. Nevertheless, this is a challenging energy environment scientific application research subject, and therefore, this study proposes an operating state recognition scheme that relies on multisource nonlinear kernel learning and p-Laplacian high-order manifold regularization logistic regress. Specifically, three measured features are selected and extracted, i.e., wellhead temperature signal, electrical power signal, and ground dynamometer cards, based on mechanism analysis, expert experience, and prior knowledge. Finally, we establish the operating condition recognition model to recognize by the multisource p-Laplacian regularization kernel logistic regress algorithm. The experimental data are derived from 60 wells of a common high-pressure and low-permeability thin oil reservoir block of an oil field in China. The corresponding trials highlight that our scheme outperforms traditional recognition methods by exploiting single-source and multiple-feature data. In the context of fewer labeled samples, the proposed method has a greater recognition effect, engineering practicability, and better model robustness than the existing schemes based on other high-order manifold learning, verifying our method's effectiveness.

17.
J Comput Appl Math ; 4452024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38464901

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

18.
BMC Bioinformatics ; 25(1): 69, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38350879

RESUMEN

BACKGROUND: Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries. RESULTS: We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) to learn nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and three real-world data for classification, clustering, and reconstruction tasks. For the classification tasks, iDeepViewLearn had competitive classification results with state-of-the-art methods in various settings. For the clustering task, we detected molecular clusters that differed in their 10-year survival rates for breast cancer. For the reconstruction task, we were able to reconstruct handwritten images using a few pixels while achieving competitive classification accuracy. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning. CONCLUSION: iDeepViewLearn is an innovative deep learning model capable of capturing nonlinear relationships between data from multiple views while achieving feature selection. It is fully open source and is freely available at https://github.com/lasandrall/iDeepViewLearn .


Asunto(s)
Aprendizaje Profundo , Análisis por Conglomerados , Genómica , Conocimiento , Metabolómica
19.
Physiol Meas ; 45(3)2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38350132

RESUMEN

Objective.We aimed to fuse the outputs of different electrocardiogram-derived respiration (EDR) algorithms to create one higher quality EDR signal.Methods.We viewed each EDR algorithm as a software sensor that recorded breathing activity from a different vantage point, identified high-quality software sensors based on the respiratory signal quality index, aligned the highest-quality EDRs with a phase synchronization technique based on the graph connection Laplacian, and finally fused those aligned, high-quality EDRs. We refer to the output as the sync-ensembled EDR signal. The proposed algorithm was evaluated on two large-scale databases of whole-night polysomnograms. We evaluated the performance of the proposed algorithm using three respiratory signals recorded from different hardware sensors, and compared it with other existing EDR algorithms. A sensitivity analysis was carried out for a total of five cases: fusion by taking the mean of EDR signals, and the four cases of EDR signal alignment without and with synchronization and without and with signal quality selection.Results.The sync-ensembled EDR algorithm outperforms existing EDR algorithms when evaluated by the synchronized correlation (γ-score), optimal transport (OT) distance, and estimated average respiratory rate score, all with statistical significance. The sensitivity analysis shows that the signal quality selection and EDR signal alignment are both critical for the performance, both with statistical significance.Conclusion.The sync-ensembled EDR provides robust respiratory information from electrocardiogram.Significance.Phase synchronization is not only theoretically rigorous but also practical to design a robust EDR.


Asunto(s)
Respiración , Procesamiento de Señales Asistido por Computador , Programas Informáticos , Frecuencia Respiratoria , Algoritmos , Electrocardiografía/métodos
20.
Res Sq ; 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38405777

RESUMEN

Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA