Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
- URL: http://arxiv.org/abs/2410.06993v1
- Date: Wed, 9 Oct 2024 15:40:04 GMT
- Title: Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
- Authors: Ivan Butakov, Alexander Sememenko, Alexander Tolmachev, Andrey Gladkov, Marina Munkhoeva, Alexey Frolov,
- Abstract summary: We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution.
We show that such modification allows for learning uniformly and normally distributed representations.
The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
- Score: 73.03684002513218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
Related papers
- Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models [3.1923251959845214]
Disentangled representation learning (DRL) aims to break down observed data into core intrinsic factors for a profound understanding of the data.
Recently, there have been limited explorations of utilizing diffusion models (DMs) for unsupervised DRL.
We propose Dynamic Gaussian Anchoring to enforce attribute-separated latent units for more interpretable DRL.
We also propose Skip Dropout technique, which easily modifies the denoising U-Net to be more DRL-friendly.
arXiv Detail & Related papers (2024-10-31T11:05:09Z) - Robust Multimodal Learning via Representation Decoupling [6.7678581401558295]
Multimodal learning has attracted increasing attention due to its practicality.
Existing methods tend to address it by learning a common subspace representation for different modality combinations.
We propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning.
arXiv Detail & Related papers (2024-07-05T12:09:33Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Balancing Act: Distribution-Guided Debiasing in Diffusion Models [31.38505986239798]
Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability.
DMs reflect the biases present in the training datasets.
We present a method for debiasing DMs without relying on additional data or model retraining.
arXiv Detail & Related papers (2024-02-28T09:53:17Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - From Points to Functions: Infinite-dimensional Representations in
Diffusion Models [23.916417852496608]
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution.
We show that a combination of information content from different time steps gives a strictly better representation for the downstream task.
arXiv Detail & Related papers (2022-10-25T05:30:53Z) - f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation [56.04628143914542]
Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
arXiv Detail & Related papers (2022-10-10T18:49:25Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Input-Output Balanced Framework for Long-tailed LiDAR Semantic
Segmentation [12.639524717464509]
We propose an input-output balanced framework to handle the issue of long-tailed distribution.
For the input space, we synthesize these tailed instances from mesh models and well simulate the position and density distribution of LiDAR scan.
For the output space, a multi-head block is proposed to group different categories based on their shapes and instance amounts.
arXiv Detail & Related papers (2021-03-26T05:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.