Inter-Instance Similarity Modeling for Contrastive Learning
- URL: http://arxiv.org/abs/2306.12243v3
- Date: Thu, 29 Jun 2023 12:14:59 GMT
- Title: Inter-Instance Similarity Modeling for Contrastive Learning
- Authors: Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang
- Abstract summary: We propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT)
Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images.
Our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets.
- Score: 22.56316444504397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existing contrastive learning methods widely adopt one-hot instance
discrimination as pretext task for self-supervised learning, which inevitably
neglects rich inter-instance similarities among natural images, then leading to
potential representation degeneration. In this paper, we propose a novel image
mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to
model inter-instance similarities among images. Following the nature of ViT, we
randomly mix multiple images from mini-batch in patch level to construct mixed
image patch sequences for ViT. Compared to the existing sample mix methods, our
PatchMix can flexibly and efficiently mix more than two images and simulate
more complicated similarity relations among natural images. In this manner, our
contrastive framework can significantly reduce the gap between contrastive
objective and ground truth in reality. Experimental results demonstrate that
our proposed method significantly outperforms the previous state-of-the-art on
both ImageNet-1K and CIFAR datasets, e.g., 3.0% linear accuracy improvement on
ImageNet-1K and 8.7% kNN accuracy improvement on CIFAR100. Moreover, our method
achieves the leading transfer performance on downstream tasks, object detection
and instance segmentation on COCO dataset. The code is available at
https://github.com/visresearch/patchmix
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - Mixing Histopathology Prototypes into Robust Slide-Level Representations
for Cancer Subtyping [19.577541771516124]
Whole-slide image analysis via the means of computational pathology often relies on processing tessellated gigapixel images with only slide-level labels available.
Applying multiple instance learning-based methods or transformer models is computationally expensive as each image, all instances have to be processed simultaneously.
TheMixer is an under-explored alternative model to common vision transformers, especially for large-scale datasets.
arXiv Detail & Related papers (2023-10-19T14:15:20Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.