LoDisc: Learning Global-Local Discriminative Features for
Self-Supervised Fine-Grained Visual Recognition
- URL: http://arxiv.org/abs/2403.04066v1
- Date: Wed, 6 Mar 2024 21:36:38 GMT
- Title: LoDisc: Learning Global-Local Discriminative Features for
Self-Supervised Fine-Grained Visual Recognition
- Authors: Jialu Shi, Zhiqiang Wei, Jie Nie, Lei Huang
- Abstract summary: We present to incorporate the subtle local fine-grained feature learning into global self-supervised contrastive learning.
A novel pretext task called Local Discrimination (LoDisc) is proposed to explicitly supervise self-supervised model's focus towards local pivotal regions.
We show that Local Discrimination pretext task can effectively enhance fine-grained clues in important local regions, and the global-local framework further refines the fine-grained feature representations of images.
- Score: 18.442966979622717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised contrastive learning strategy has attracted remarkable
attention due to its exceptional ability in representation learning. However,
current contrastive learning tends to learn global coarse-grained
representations of the image that benefit generic object recognition, whereas
such coarse-grained features are insufficient for fine-grained visual
recognition. In this paper, we present to incorporate the subtle local
fine-grained feature learning into global self-supervised contrastive learning
through a pure self-supervised global-local fine-grained contrastive learning
framework. Specifically, a novel pretext task called Local Discrimination
(LoDisc) is proposed to explicitly supervise self-supervised model's focus
towards local pivotal regions which are captured by a simple-but-effective
location-wise mask sampling strategy. We show that Local Discrimination pretext
task can effectively enhance fine-grained clues in important local regions, and
the global-local framework further refines the fine-grained feature
representations of images. Extensive experimental results on different
fine-grained object recognition tasks demonstrate that the proposed method can
lead to a decent improvement in different evaluation settings. Meanwhile, the
proposed method is also effective in general object recognition tasks.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization.
It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z) - Adaptive Global-Local Representation Learning and Selection for
Cross-Domain Facial Expression Recognition [54.334773598942775]
Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER)
We propose an Adaptive Global-Local Representation Learning and Selection framework.
arXiv Detail & Related papers (2024-01-20T02:21:41Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Modeling Multiple Views via Implicitly Preserving Global Consistency and
Local Complementarity [61.05259660910437]
We propose a global consistency and complementarity network (CoCoNet) to learn representations from multiple views.
On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge can improve the discriminability of the learned representations.
Lastly on the local stage, we propose a complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information.
arXiv Detail & Related papers (2022-09-16T09:24:00Z) - Deep face recognition with clustering based domain adaptation [57.29464116557734]
We propose a new clustering-based domain adaptation method designed for face recognition task in which the source and target domain do not share any classes.
Our method effectively learns the discriminative target feature by aligning the feature domain globally, and, at the meantime, distinguishing the target clusters locally.
arXiv Detail & Related papers (2022-05-27T12:29:11Z) - Learning Consistency from High-quality Pseudo-labels for Weakly
Supervised Object Localization [7.602783618330373]
We propose a two-stage approach to learn more consistent localization.
In the first stage, we propose a mask-based pseudo label generator algorithm, and use the pseudo-supervised learning method to initialize an object localization network.
In the second stage, we propose a simple and effective method for evaluating the confidence of pseudo-labels based on classification discrimination.
arXiv Detail & Related papers (2022-03-18T09:05:51Z) - Spatially Consistent Representation Learning [12.120041613482558]
We propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks.
We devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region.
On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements.
arXiv Detail & Related papers (2021-03-10T15:23:45Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Gait Recognition via Effective Global-Local Feature Representation and
Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields.
Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans.
We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.