Partitioning Image Representation in Contrastive Learning
- URL: http://arxiv.org/abs/2203.10454v1
- Date: Sun, 20 Mar 2022 04:55:39 GMT
- Title: Partitioning Image Representation in Contrastive Learning
- Authors: Hyunsub Lee and Heeyoul Choi
- Abstract summary: We introduce a new representation, partitioned representation, which can learn both common and unique features of the anchor and positive samples in contrastive learning.
We show that our approach can separate two types of information in the VAE framework and outperforms the conventional BYOL in linear separability and a few-shot learning task as downstream tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contrastive learning in the image domain, the anchor and positive samples
are forced to have as close representations as possible. However, forcing the
two samples to have the same representation could be misleading because the
data augmentation techniques make the two samples different. In this paper, we
introduce a new representation, partitioned representation, which can learn
both common and unique features of the anchor and positive samples in
contrastive learning. The partitioned representation consists of two parts: the
content part and the style part. The content part represents common features of
the class, and the style part represents the own features of each sample, which
can lead to the representation of the data augmentation method. We can achieve
the partitioned representation simply by decomposing a loss function of
contrastive learning into two terms on the two separate representations,
respectively. To evaluate our representation with two parts, we take two
framework models: Variational AutoEncoder (VAE) and BootstrapYour Own
Latent(BYOL) to show the separability of content and style, and to confirm the
generalization ability in classification, respectively. Based on the
experiments, we show that our approach can separate two types of information in
the VAE framework and outperforms the conventional BYOL in linear separability
and a few-shot learning task as downstream tasks.
Related papers
- Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Disentangling Multi-view Representations Beyond Inductive Bias [32.15900989696017]
We propose a novel multi-view representation disentangling method that ensures both interpretability and generalizability of the resulting representations.
Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance.
arXiv Detail & Related papers (2023-08-03T09:09:28Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - Reflection Invariance Learning for Few-shot Semantic Segmentation [53.20466630330429]
Few-shot semantic segmentation (FSS) aims to segment objects of unseen classes in query images with only a few annotated support images.
This paper proposes a fresh few-shot segmentation framework to mine the reflection invariance in a multi-view matching manner.
Experiments on both PASCAL-$5textiti$ and COCO-$20textiti$ datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-01T15:14:58Z) - Retriever: Learning Content-Style Representation as a Token-Level
Bipartite Graph [89.52990975155579]
An unsupervised framework, named Retriever, is proposed to learn such representations.
Being modal-agnostic, the proposed Retriever is evaluated in both speech and image domains.
arXiv Detail & Related papers (2022-02-24T19:00:03Z) - Cross-Modal Discrete Representation Learning [73.68393416984618]
We present a self-supervised learning framework that learns a representation that captures finer levels of granularity across different modalities.
Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities.
arXiv Detail & Related papers (2021-06-10T00:23:33Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Support-set bottlenecks for video-text representation learning [131.4161071785107]
The dominant paradigm for learning video-text representations -- noise contrastive learning -- is too strict.
We propose a novel method that alleviates this by leveraging a generative model to naturally push these related samples together.
Our proposed method outperforms others by a large margin on MSR-VTT, VATEX and ActivityNet, and MSVD for video-to-text and text-to-video retrieval.
arXiv Detail & Related papers (2020-10-06T15:38:54Z) - Demystifying Contrastive Self-Supervised Learning: Invariances,
Augmentations and Dataset Biases [34.02639091680309]
Recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class.
We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations.
Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet.
arXiv Detail & Related papers (2020-07-28T00:11:31Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.