Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 622
Filtrar
1.
Sensors (Basel) ; 24(17)2024 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-39275421

RESUMEN

The cold-start problem in sequence recommendations presents a critical and challenging issue for portable sensing devices. Existing content-aware approaches often struggle to effectively distinguish the relative importance of content features and typically lack generalizability when processing new data. To address these limitations, we propose a content-aware few-shot meta-learning (CFSM) model to enhance the accuracy of cold-start sequence recommendations. Our model incorporates a double-tower network (DT-Net) that learns user and item representations through a meta-encoder and a mutual attention encoder, effectively mitigating the impact of noisy data on auxiliary information. By framing the cold-start problem as few-shot meta-learning, we employ a model-agnostic meta-optimization strategy to train the model across a variety of tasks during the meta-learning phase. Extensive experiments conducted on three real-world datasets-ShortVideos, MovieLens, and Book-Crossing-demonstrate the superiority of our model in cold-start recommendation scenarios. Compared to MetaCs-DNN, the second-best approach, CFSM, achieves improvements of 1.55%, 1.34%, and 2.42% under the AUC metric on the three datasets, respectively.

2.
Sensors (Basel) ; 24(17)2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39275527

RESUMEN

Anomaly detection has gained significant attention with the advancements in deep neural networks. Effective training requires both normal and anomalous data, but this often leads to a class imbalance, as anomalous data is scarce. Traditional augmentation methods struggle to maintain the correlation between anomalous patterns and their surroundings. To address this, we propose an adjacent augmentation technique that generates synthetic anomaly images, preserving object shapes while distorting contours to enhance correlation. Experimental results show that adjacent augmentation captures high-quality anomaly features, achieving superior AU-ROC and AU-PR scores compared to existing methods. Additionally, our technique produces synthetic normal images, aiding in learning detailed normal data features and reducing sensitivity to minor variations. Our framework considers all training images within a batch as positive pairs, pairing them with synthetic normal images as positive pairs and with synthetic anomaly images as negative pairs. This compensates for the lack of anomalous features and effectively distinguishes between normal and anomalous features, mitigating class imbalance. Using the ResNet50 network, our model achieved perfect AU-ROC and AU-PR scores of 100% in the bottle category of the MVTec-AD dataset. We are also investigating the relationship between anomalous pattern size and detection performance.

3.
Quant Imaging Med Surg ; 14(9): 6294-6310, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-39281155

RESUMEN

Background: Resting-state brain networks represent the interconnectivity of different brain regions during rest. Utilizing brain network analysis methods to model these networks can enhance our understanding of how different brain regions collaborate and communicate without explicit external stimuli. However, analyzing resting-state brain networks faces challenges due to high heterogeneity and noise correlation between subjects. This study proposes a brain structure learning-guided multi-view graph representation learning method to address the limitations of current brain network analysis and improve the diagnostic accuracy (ACC) of mental disorders. Methods: We first used multiple thresholds to generate different sparse levels of brain networks. Subsequently, we introduced graph pooling to optimize the brain network representation by reducing noise edges and data inconsistency, thereby providing more reliable input for subsequent graph convolutional networks (GCNs). Following this, we designed a multi-view GCN to comprehensively capture the complexity and variability of brain structure. Finally, we employed an attention-based adaptive module to adjust the contributions of different views, facilitating their fusion. Considering that the Smith atlas offers superior characterization of resting-state brain networks, we utilized the Smith atlas to construct the graph network. Results: Experiments on two mental disorder datasets, the Autism Brain Imaging Data Exchange (ABIDE) dataset and the Mexican Cocaine Use Disorders (SUDMEX CONN) dataset, show that our model outperforms the state-of-the-art methods, achieving nearly 75% ACC and 70% area under the receiver operating characteristic curve (AUC) on both datasets. Conclusions: These findings demonstrate that our method of combining multi-view graph learning and brain structure learning can effectively capture crucial structural information in brain networks while facilitating the acquisition of feature information from diverse perspectives, thereby improving the performance of brain network analysis.

4.
Int J Mol Sci ; 25(17)2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39273521

RESUMEN

The vast corpus of heterogeneous biomedical data stored in databases, ontologies, and terminologies presents a unique opportunity for drug design. Integrating and fusing these sources is essential to develop data representations that can be analyzed using artificial intelligence methods to generate novel drug candidates or hypotheses. Here, we propose Non-Negative Matrix Tri-Factorization as an invaluable tool for integrating and fusing data, as well as for representation learning. Additionally, we demonstrate how representations learned by Non-Negative Matrix Tri-Factorization can effectively be utilized by traditional artificial intelligence methods. While this approach is domain-agnostic and applicable to any field with vast amounts of structured and semi-structured data, we apply it specifically to computational pharmacology and drug repurposing. This field is poised to benefit significantly from artificial intelligence, particularly in personalized medicine. We conducted extensive experiments to evaluate the performance of the proposed method, yielding exciting results, particularly compared to traditional methods. Novel drug-target predictions have also been validated in the literature, further confirming their validity. Additionally, we tested our method to predict drug synergism, where constructing a classical matrix dataset is challenging. The method demonstrated great flexibility, suggesting its applicability to a wide range of tasks in drug design and discovery.


Asunto(s)
Reposicionamiento de Medicamentos , Reposicionamiento de Medicamentos/métodos , Humanos , Inteligencia Artificial , Biología Computacional/métodos , Aprendizaje Automático , Algoritmos , Descubrimiento de Drogas/métodos , Multiómica
5.
Nan Fang Yi Ke Da Xue Xue Bao ; 44(8): 1561-1570, 2024 Aug 20.
Artículo en Chino | MEDLINE | ID: mdl-39276052

RESUMEN

OBJECTIVE: To evaluate the performance of magnetic resonance imaging (MRI) multi-sequence feature imputation and fusion mutual model based on sequence deletion in differentiating high-grade glioma (HGG) from low-grade glioma (LGG). METHODS: We retrospectively collected multi-sequence MR images from 305 glioma patients, including 189 HGG patients and 116 LGG patients. The region of interest (ROI) of T1-weighted images (T1WI), T2-weighted images (T2WI), T2 fluid attenuated inversion recovery (T2_FLAIR) and post-contrast enhancement T1WI (CE_T1WI) were delineated to extract the radiomics features. A mutual-aid model of MRI multi-sequence feature imputation and fusion based on sequence deletion was used for imputation and fusion of the feature matrix with missing data. The discriminative ability of the model was evaluated using 5-fold cross-validation method and by assessing the accuracy, balanced accuracy, area under the ROC curve (AUC), specificity, and sensitivity. The proposed model was quantitatively compared with other non-holonomic multimodal classification models for discriminating HGG and LGG. Class separability experiments were performed on the latent features learned by the proposed feature imputation and fusion methods to observe the classification effect of the samples in twodimensional plane. Convergence experiments were used to verify the feasibility of the model. RESULTS: For differentiation of HGG from LGG with a missing rate of 10%, the proposed model achieved accuracy, balanced accuracy, AUC, specificity, and sensitivity of 0.777, 0.768, 0.826, 0.754 and 0.780, respectively. The fused latent features showed excellent performance in the class separability experiment, and the algorithm could be iterated to convergence with superior classification performance over other methods at the missing rates of 30% and 50%. CONCLUSION: The proposed model has excellent performance in classification task of HGG and LGG and outperforms other non-holonomic multimodal classification models, demonstrating its potential for efficient processing of non-holonomic multimodal data.


Asunto(s)
Neoplasias Encefálicas , Glioma , Imagen por Resonancia Magnética , Humanos , Glioma/diagnóstico por imagen , Glioma/patología , Imagen por Resonancia Magnética/métodos , Estudios Retrospectivos , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/patología , Algoritmos , Clasificación del Tumor , Curva ROC , Sensibilidad y Especificidad
6.
Magn Reson Med ; 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39233507

RESUMEN

PURPOSE: To develop and evaluate a novel method for computationally efficient reconstruction from noisy MR spectroscopic imaging (MRSI) data. METHODS: The proposed method features (a) a novel strategy that jointly learns a nonlinear low-dimensional representation of high-dimensional spectroscopic signals and a neural-network-based projector to recover the low-dimensional embeddings from noisy/limited data; (b) a formulation that integrates the forward encoding model, a regularizer exploiting the learned representation, and a complementary spatial constraint; and (c) a highly efficient algorithm enabled by the learned projector within an alternating direction method of multipliers (ADMM) framework, circumventing the computationally expensive network inversion subproblem. RESULTS: The proposed method has been evaluated using simulations as well as in vivo 1 $$ {}^1 $$ H and 31 $$ {}^{31} $$ P MRSI data, demonstrating improved performance over state-of-the-art methods, with about 6 × $$ \times $$ fewer averages needed than standard Fourier reconstruction for similar metabolite estimation variances and up to 100 × $$ \times $$ reduction in processing time compared to a prior neural network constrained reconstruction method. Computational and theoretical analyses were performed to offer further insights into the effectiveness of the proposed method. CONCLUSION: A novel method was developed for fast, high-SNR spatiospectral reconstruction from noisy MRSI data. We expect our method to be useful for enhancing the quality of MRSI or other high-dimensional spatiospectral imaging data.

7.
Neural Netw ; 180: 106672, 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39236409

RESUMEN

Over the past decades, massive Electronic Health Records (EHRs) have been accumulated in Intensive Care Unit (ICU) and many other healthcare scenarios. The rich and comprehensive information recorded presents an exceptional opportunity for patient outcome predictions. Nevertheless, due to the diversity of data modalities, EHRs exhibit a heterogeneous characteristic, raising a difficulty to organically leverage information from various modalities. It is an urgent need to capture the underlying correlations among different modalities. In this paper, we propose a novel framework named Multimodal Fusion Network (MFNet) for ICU patient outcome prediction. First, we incorporate multiple modality-specific encoders to learn different modality representations. Notably, a graph guided encoder is designed to capture underlying global relationships among medical codes, and a text encoder with pre-fine-tuning strategy is adopted to extract appropriate text representations. Second, we propose to pairwise merge multimodal representations with a tailored hierarchical fusion mechanism. The experiments conducted on the eICU-CRD dataset validate that MFNet achieves superior performance on mortality prediction and Length of Stay (LoS) prediction compared with various representative and state-of-the-art baselines. Moreover, comprehensive ablation study demonstrates the effectiveness of each component of MFNet.

8.
Neural Netw ; 180: 106651, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39217862

RESUMEN

Graph neural networks (GNNs) have achieved state-of-the-art performance in graph representation learning. Message passing neural networks, which learn representations through recursively aggregating information from each node and its neighbors, are among the most commonly-used GNNs. However, a wealth of structural information of individual nodes and full graphs is often ignored in such process, which restricts the expressive power of GNNs. Various graph data augmentation methods that enable the message passing with richer structure knowledge have been introduced as one main way to tackle this issue, but they are often focused on individual structure features and difficult to scale up with more structure features. In this work we propose a novel approach, namely collective structure knowledge-augmented graph neural network (CoS-GNN), in which a new message passing method is introduced to allow GNNs to harness a diverse set of node- and graph-level structure features, together with original node features/attributes, in augmented graphs. In doing so, our approach largely improves the structural knowledge modeling of GNNs in both node and graph levels, resulting in substantially improved graph representations. This is justified by extensive empirical results where CoS-GNN outperforms state-of-the-art models in various graph-level learning tasks, including graph classification, anomaly detection, and out-of-distribution generalization.

9.
Digit Health ; 10: 20552076241256730, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39114113

RESUMEN

Objective: Social anxiety disorder (SAD) is characterized by heightened sensitivity to social interactions or settings, which disrupts daily activities and social relationships. This study aimed to explore the feasibility of utilizing digital phenotypes for predicting the severity of these symptoms and to elucidate how the main predictive digital phenotypes differed depending on the symptom severity. Method: We collected 511 behavioral and physiological data over 7 to 13 weeks from 27 SAD and 31 healthy individuals using smartphones and smartbands, from which we extracted 76 digital phenotype features. To reduce data dimensionality, we employed an autoencoder, an unsupervised machine learning model that transformed these features into low-dimensional latent representations. Symptom severity was assessed with three social anxiety-specific and nine additional psychological scales. For each symptom, we developed individual classifiers to predict the severity and applied integrated gradients to identify critical predictive features. Results: Classifiers targeting social anxiety symptoms outperformed baseline accuracy, achieving mean accuracy and F1 scores of 87% (with both metrics in the range 84-90%). For secondary psychological symptoms, classifiers demonstrated mean accuracy and F1 scores of 85%. Application of integrated gradients revealed key digital phenotypes with substantial influence on the predictive models, differentiated by symptom types and levels of severity. Conclusions: Leveraging digital phenotypes through feature representation learning could effectively classify symptom severities in SAD. It identifies distinct digital phenotypes associated with the cognitive, emotional, and behavioral dimensions of SAD, thereby advancing the understanding of SAD. These findings underscore the potential utility of digital phenotypes in informing clinical management.

10.
Comput Biol Med ; 180: 108974, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39096613

RESUMEN

Promoters are DNA sequences that bind with RNA polymerase to initiate transcription, regulating this process through interactions with transcription factors. Accurate identification of promoters is crucial for understanding gene expression regulation mechanisms and developing therapeutic approaches for various diseases. However, experimental techniques for promoter identification are often expensive, time-consuming, and inefficient, necessitating the development of accurate and efficient computational models for this task. Enhancing the model's ability to recognize promoters across multiple species and improving its interpretability pose significant challenges. In this study, we introduce a novel interpretable model based on graph neural networks, named GraphPro, for multi-species promoter identification. Initially, we encode the sequences using k-tuple nucleotide frequency pattern, dinucleotide physicochemical properties, and dna2vec. Subsequently, we construct two feature extraction modules based on convolutional neural networks and graph neural networks. These modules aim to extract specific motifs from the promoters, learn their dependencies, and capture the underlying structural features of the promoters, providing a more comprehensive representation. Finally, a fully connected neural network predicts whether the input sequence is a promoter. We conducted extensive experiments on promoter datasets from eight species, including Human, Mouse, and Escherichia coli. The experimental results show that the average Sn, Sp, Acc and MCC values of GraphPro are 0.9123, 0.9482, 0.8840 and 0.7984, respectively. Compared with previous promoter identification methods, GraphPro not only achieves better recognition accuracy on multiple species, but also outperforms all previous methods in cross-species prediction ability. Furthermore, by visualizing GraphPro's decision process and analyzing the sequences matching the transcription factor binding motifs captured by the model, we validate its significant advantages in biological interpretability. The source code for GraphPro is available at https://github.com/liuliwei1980/GraphPro.


Asunto(s)
Redes Neurales de la Computación , Regiones Promotoras Genéticas , Humanos , Animales , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Ratones , Programas Informáticos
11.
Neural Netw ; 180: 106572, 2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39173200

RESUMEN

Person Re-identification (Re-ID) aims to match person images across non-overlapping cameras. The existing approaches formulate this task as fine-grained representation learning with deep neural networks, which involves extracting image features using a deep convolutional network, followed by mapping the features into a discriminative space through another smaller network, in order to make full use of all possible cues. However, recent Re-ID methods that strive to capture every cue and make the space more discriminative have resulted in longer features, ranging from 1024 to 14336, leading to higher time (distance computation) and space (feature storage) complexities. There are two potential solutions: reduction-after-training methods (such as Principal Component Analysis and Linear Discriminant Analysis) and reduction-during-training methods (such as 1 × 1 Convolution). The former utilizes a statistical approach aiming for a global optimum but lacking end-to-end optimization of large data and deep neural networks. The latter lacks theoretical guarantees and may be vulnerable to training noise such as dataset noise or initialization seed. To address these limitations, we propose a method called Euclidean-Distance-Preserving Feature Reduction (EDPFR) that combines the strengths of both reduction-after-training and reduction-during-training methods. EDPFR first formulates the feature reduction process as a matrix decomposition and derives a condition to preserve the Euclidean distance between features, thus ensuring accuracy in theory. Furthermore, the method integrates the matrix decomposition process into a deep neural network to enable end-to-end optimization and batch training, while maintaining the theoretical guarantee. The result of the EDPFR is a reduction of the feature dimensions from fa and fb to fa' and fb', while preserving their Euclidean distance, i.e.L2(fa,fb)=L2(fa',fb'). In addition to its Euclidean-Distance-Preserving capability, EDPFR also features a novel feature-level distillation loss. One of the main challenges in knowledge distillation is dimension mismatch. While previous distillation losses, usually project the mismatched features to matched class-level, spatial-level, or similarity-level spaces, this can result in a loss of information and decrease the flexibility and efficiency of distillation. Our proposed feature-level distillation leverages the benefits of the Euclidean-Distance-Preserving property and performs distillation directly in the feature space, resulting in a more flexible and efficient approach. Extensive on three Re-ID datasets, Market-1501, DukeMTMC-reID and MSMT demonstrate the effectiveness of our proposed Euclidean-Distance-Preserving Feature Reduction.

12.
Stud Health Technol Inform ; 316: 690-694, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176889

RESUMEN

BACKGROUND: Urothelial Bladder Cancer (UBC) is a common cancer with a high risk of recurrence, which is influenced by the TNM classification, grading, age, and other factors. Recent studies demonstrate reliable and accurate recurrence prediction using Machine Learning (ML) algorithms and even outperform traditional approaches. However, most ML algorithms cannot process categorical input features, which must first be encoded into numerical values. Choosing the appropriate encoding strategy has a significant impact on the prediction quality. OBJECTIVE: We investigate the impact of encoding strategies for ordinal features in the prediction quality of ML algorithms. METHOD: We compare three different encoding strategies namely one-hot, ordinal, and entity embedding in predicting the 2-year recurrence in UBC patients using an artificial neural network. We use ordered categorical and numerical data of UBC patients provided by the Cancer Registry Rhineland-Palatinate. RESULTS: We show superior prediction quality using entity embedding encoding with 84.6% precision, an overall accuracy of 73.8%, and 68.9% AUC on testing data over 100 epochs after 30 runs compared to one-hot and ordinal encoding. CONCLUSION: We confirm the superiority of entity embedding encoding as it could provide a more detailed and accurate representation of ordinal features in numerical scales. This can lead to enhanced generalizability, resulting in significantly improved prediction quality.


Asunto(s)
Aprendizaje Automático , Recurrencia Local de Neoplasia , Neoplasias de la Vejiga Urinaria , Humanos , Redes Neurales de la Computación , Algoritmos
13.
Adv Sci (Weinh) ; : e2407013, 2024 08 19.
Artículo en Inglés | MEDLINE | ID: mdl-39159140

RESUMEN

The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.

14.
Neural Netw ; 179: 106479, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39146716

RESUMEN

Multi-Modal Entity Alignment (MMEA), aiming to discover matching entity pairs on two multi-modal knowledge graphs (MMKGs), is an essential task in knowledge graph fusion. Through mining feature information of MMKGs, entities are aligned to tackle the issue that an MMKG is incapable of effective integration. The recent attempt at neighbors and attribute fusion mainly focuses on aggregating multi-modal attributes, neglecting the structure effect with multi-modal attributes for entity alignment. This paper proposes an innovative approach, namely TriFac, to exploit embedding refinement for factorizing the original multi-modal knowledge graphs through a two-stage MMKG factorization. Notably, we propose triplet-aware graph neural networks to aggregate multi-relational features. We propose multi-modal fusion for aggregating multiple features and design three novel metrics to measure knowledge graph factorization performance on the unified factorized latent space. Empirical results indicate the effectiveness of TriFac, surpassing previous state-of-the-art models on two MMEA datasets and a power system dataset.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Minería de Datos/métodos , Humanos , Conocimiento
15.
J Imaging ; 10(8)2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39194985

RESUMEN

In recent years, contrastive learning has been a highly favored method for self-supervised representation learning, which significantly improves the unsupervised training of deep image models. Self-supervised learning is a subset of unsupervised learning in which the learning process is supervised by creating pseudolabels from the data themselves. Using supervised final adjustments after unsupervised pretraining is one way to take the most valuable information from a vast collection of unlabeled data and teach from a small number of labeled instances. This study aims firstly to compare contrastive learning with other traditional learning models; secondly to demonstrate by experimental studies the superiority of contrastive learning during classification; thirdly to fine-tune performance using pretrained models and appropriate hyperparameter selection; and finally to address the challenge of using contrastive learning techniques to produce data representations with semantic meaning that are independent of irrelevant factors like position, lighting, and background. Relying on contrastive techniques, the model efficiently captures meaningful representations by discerning similarities and differences between modified copies of the same image. The proposed strategy, involving unsupervised pretraining followed by supervised fine-tuning, improves the robustness, accuracy, and knowledge extraction of deep image models. The results show that even with a modest 5% of data labeled, the semisupervised model achieves an accuracy of 57.72%. However, the use of supervised learning with a contrastive approach and careful hyperparameter tuning increases accuracy to 85.43%. Further adjustment of the hyperparameters resulted in an excellent accuracy of 88.70%.

16.
Med Image Anal ; 97: 103299, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39146702

RESUMEN

Recently, vision-language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient's condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.


Asunto(s)
Semántica , Humanos , Aprendizaje Automático , Algoritmos , Interpretación de Imagen Asistida por Computador/métodos
17.
Neural Netw ; 179: 106579, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39096749

RESUMEN

How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action-induced invariant Representation (SAR) method, which decouples the controlled part (i.e., task-relevant information) and the uncontrolled part (i.e., task-irrelevant information) in noisy observations through sequential actions, thereby extracting effective representations related to decision tasks. To achieve it, the characteristic function of the action sequence's probability distribution is modeled to specifically optimize the state encoder. We conduct extensive experiments on the distracting DeepMind Control suite while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by applying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at https://github.com/DMU-XMU/SAR.git.


Asunto(s)
Refuerzo en Psicología , Humanos , Redes Neurales de la Computación , Algoritmos
18.
Neural Netw ; 179: 106584, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39142174

RESUMEN

Contrastive learning has emerged as a cornerstone in unsupervised representation learning. Its primary paradigm involves an instance discrimination task utilizing InfoNCE loss where the loss has been proven to be a form of mutual information. Consequently, it has become a common practice to analyze contrastive learning using mutual information as a measure. Yet, this analysis approach presents difficulties due to the necessity of estimating mutual information for real-world applications. This creates a gap between the elegance of its mathematical foundation and the complexity of its estimation, thereby hampering the ability to derive solid and meaningful insights from mutual information analysis. In this study, we introduce three novel methods and a few related theorems, aimed at enhancing the rigor of mutual information analysis. Despite their simplicity, these methods can carry substantial utility. Leveraging these approaches, we reassess three instances of contrastive learning analysis, illustrating the capacity of the proposed methods to facilitate deeper comprehension or to rectify pre-existing misconceptions. The main results can be summarized as follows: (1) While small batch sizes influence the range of training loss, they do not inherently limit learned representation's information content or affect downstream performance adversely; (2) Mutual information, with careful selection of positive pairings and post-training estimation, proves to be a superior measure for evaluating practical networks; and (3) Distinguishing between task-relevant and irrelevant information presents challenges, yet irrelevant information sources do not necessarily compromise the generalization of downstream tasks.


Asunto(s)
Redes Neurales de la Computación , Humanos , Algoritmos , Aprendizaje/fisiología , Aprendizaje Automático no Supervisado
19.
Neural Netw ; 179: 106629, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39153401

RESUMEN

Domain Generalization (DG) focuses on the Out-Of-Distribution (OOD) generalization, which is able to learn a robust model that generalizes the knowledge acquired from the source domain to the unseen target domain. However, due to the existence of the domain shift, domain-invariant representation learning is challenging. Guided by fine-grained knowledge, we propose a novel paradigm Mask-Shift-Inference (MSI) for DG based on the architecture of Convolutional Neural Networks (CNN). Different from relying on a series of constraints and assumptions for model optimization, this paradigm novelly shifts the focus to feature channels in the latent space for domain-invariant representation learning. We put forward a two-branch working mode of a main module and multiple domain-specific sub-modules. The latter can only achieve good prediction performance in its own specific domain but poor predictions in other source domains, which provides the main module with the fine-grained knowledge guidance and contributes to the improvement of the cognitive ability of MSI. Firstly, during the forward propagation of the main module, the proposed MSI accurately discards unstable channels based on spurious classifications varying across domains, which have domain-specific prediction limitations and are not conducive to generalization. In this process, a progressive scheme is adopted to adaptively increase the masking ratio according to the training progress to further reduce the risk of overfitting. Subsequently, our paradigm enters the compatible shifting stage before the formal prediction. Based on maximizing semantic retention, we implement the domain style matching and shifting through the simple transformation through Fourier transform, which can explicitly and safely shift the target domain back to the source domain whose style is closest to it, requiring no additional model updates and reducing the domain gap. Eventually, the paradigm MSI enters the formal inference stage. The updated target domain is predicted in the main module trained in the previous stage with the benefit of familiar knowledge from the nearest source domain masking scheme. Our paradigm is logically progressive, which can intuitively exclude the confounding influence of domain-specific spurious information along with mitigating domain shifts and implicitly perform semantically invariant representation learning, achieving robust OOD generalization. Extensive experimental results on PACS, VLCS, Office-Home and DomainNet datasets verify the superiority and effectiveness of the proposed method.


Asunto(s)
Redes Neurales de la Computación , Humanos , Generalización Psicológica/fisiología , Algoritmos , Aprendizaje Automático
20.
Front Neurorobot ; 18: 1423848, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39144485

RESUMEN

Aiming at the problem that the existing methods are insufficient in dealing with the background noise anti-interference of underwater fish images, a contrastive learning method of ignoring background called CLIB for underwater fish image classification is proposed to improve the accuracy and robustness of underwater fish image classification. First, CLIB effectively separates the subject from the background in the image through the extraction module and applies it to contrastive learning by composing three complementary views with the original image. To further improve the adaptive ability of CLIB in complex underwater images, we propose a multi-view-based contrastive loss function, whose core idea is to enhance the similarity between the original image and the subject and maximize the difference between the subject and the background, making CLIB focus more on learning the core features of the subject during the training process, and effectively ignoring the interference of background noise. Experiments on the Fish4Knowledge, Fish-gres, WildFish-30, and QUTFish-89 public datasets show that our method performs well, with improvements of 1.43-6.75%, 8.16-8.95%, 13.1-14.82%, and 3.92-6.19%, respectively, compared with the baseline model, further validating the effectiveness of CLIB.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA