Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning
- URL: http://arxiv.org/abs/2301.11915v2
- Date: Tue, 23 Jan 2024 04:00:25 GMT
- Title: Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning
- Authors: Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang
Wang, Wenyu Liu, Leye Wang, Jingdong Wang
- Abstract summary: We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
- Score: 88.45460880824376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we are interested in understanding self-supervised pretraining
through studying the capability that self-supervised representation pretraining
methods learn part-aware representations. The study is mainly motivated by that
random views, used in contrastive learning, and random masked (visible)
patches, used in masked image modeling, are often about object parts.
We explain that contrastive learning is a part-to-whole task: the projection
layer hallucinates the whole object representation from the object part
representation learned from the encoder, and that masked image modeling is a
part-to-part task: the masked patches of the object are hallucinated from the
visible patches. The explanation suggests that the self-supervised pretrained
encoder is required to understand the object part. We empirically compare the
off-the-shelf encoders pretrained with several representative methods on
object-level recognition and part-level recognition. The results show that the
fully-supervised model outperforms self-supervised models for object-level
recognition, and most self-supervised contrastive learning and masked image
modeling methods outperform the fully-supervised method for part-level
recognition. It is observed that the combination of contrastive learning and
masked image modeling further improves the performance.
Related papers
- Masked Image Modeling Boosting Semi-Supervised Semantic Segmentation [38.55611683982936]
We introduce a novel class-wise masked image modeling that independently reconstructs different image regions according to their respective classes.
We develop a feature aggregation strategy that minimizes the distances between features corresponding to the masked and visible parts within the same class.
In semantic space, we explore the application of masked image modeling to enhance regularization.
arXiv Detail & Related papers (2024-11-13T16:42:07Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability [10.79834380458689]
Self-supervised learning confronts significant privacy concerns, especially in vision.
We propose a unified membership inference method called PartCrop.
We conduct extensive attacks on self-supervised models with different training protocols and structures.
To defend against PartCrop, we evaluate two common approaches, i.e., early stop and differential privacy, and propose a tailored method called shrinking crop scale range.
arXiv Detail & Related papers (2024-04-03T05:04:55Z) - Self-Supervised Learning for Visual Relationship Detection through
Masked Bounding Box Reconstruction [6.798515070856465]
We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD)
Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR)
arXiv Detail & Related papers (2023-11-08T16:59:26Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.