UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
- URL: http://arxiv.org/abs/2203.06965v1
- Date: Mon, 14 Mar 2022 10:04:04 GMT
- Title: UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
- Authors: Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying
Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
- Abstract summary: We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
- Score: 50.87603616476038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) holds promise in leveraging large amounts of
unlabeled data. However, the success of popular SSL methods has limited on
single-centric-object images like those in ImageNet and ignores the correlation
among the scene and instances, as well as the semantic difference of instances
in the scene. To address the above problems, we propose a Unified
Self-supervised Visual Pre-training (UniVIP), a novel self-supervised framework
to learn versatile visual representations on either single-centric-object or
non-iconic dataset. The framework takes into account the representation
learning at three levels: 1) the similarity of scene-scene, 2) the correlation
of scene-instance, 3) the discrimination of instance-instance. During the
learning, we adopt the optimal transport algorithm to automatically measure the
discrimination of instances. Massive experiments show that UniVIP pre-trained
on non-iconic COCO achieves state-of-the-art transfer performance on a variety
of downstream tasks, such as image classification, semi-supervised learning,
object detection and segmentation. Furthermore, our method can also exploit
single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5%
with the same pre-training epochs in linear probing, and surpass current
self-supervised object detection methods on COCO dataset, demonstrating its
universality and potential.
Related papers
- A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Visual Self-supervised Learning Scheme for Dense Prediction Tasks on X-ray Images [3.782392436834913]
Self-supervised learning (SSL) has led to considerable progress in natural language processing (NLP)
However, the incorporation of contrastive learning into existing visual SSL models has led to considerable progress, often surpassing supervised counterparts.
Here, we focus on dense prediction tasks using security inspection x-ray images to evaluate our proposed model, Segment localization (SegLoc)
Based upon the Instance localization (InsLoc) model, SegLoc addresses one of the key challenges of contrastive learning, i.e., false negative pairs of query embeddings.
arXiv Detail & Related papers (2023-10-12T15:42:17Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Masked Momentum Contrastive Learning for Zero-shot Semantic
Understanding [39.424931953675994]
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data.
This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks.
arXiv Detail & Related papers (2023-08-22T13:55:57Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods [4.680881326162484]
Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results.
We propose an approach to identify those images with similar semantic content and treat them as positive instances.
We run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches.
arXiv Detail & Related papers (2023-06-28T11:47:08Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Unsupervised Object-Level Representation Learning from Scene Images [97.07686358706397]
Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
arXiv Detail & Related papers (2021-06-22T17:51:24Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.