Mugs: A Multi-Granular Self-Supervised Learning Framework
- URL: http://arxiv.org/abs/2203.14415v1
- Date: Sun, 27 Mar 2022 23:42:05 GMT
- Title: Mugs: A Multi-Granular Self-Supervised Learning Framework
- Authors: Pan Zhou and Yichen Zhou and Chenyang Si and Weihao Yu and Teck Khim
Ng and Shuicheng Yan
- Abstract summary: We propose an effective MUlti-Granular Self-supervised learning (Mugs) framework to explicitly learn multi-granular visual features.
Mugs has three complementary granular supervisions: 1) an instance discrimination supervision (IDS), 2) a novel local-group discrimination supervision (LGDS), and 3) a group discrimination supervision (GDS)
- Score: 114.34858365121725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In self-supervised learning, multi-granular features are heavily desired
though rarely investigated, as different downstream tasks (e.g., general and
fine-grained classification) often require different or multi-granular
features, e.g.~fine- or coarse-grained one or their mixture. In this work, for
the first time, we propose an effective MUlti-Granular Self-supervised learning
(Mugs) framework to explicitly learn multi-granular visual features. Mugs has
three complementary granular supervisions: 1) an instance discrimination
supervision (IDS), 2) a novel local-group discrimination supervision (LGDS),
and 3) a group discrimination supervision (GDS). IDS distinguishes different
instances to learn instance-level fine-grained features. LGDS aggregates
features of an image and its neighbors into a local-group feature, and pulls
local-group features from different crops of the same image together and push
them away for others. It provides complementary instance supervision to IDS via
an extra alignment on local neighbors, and scatters different local-groups
separately to increase discriminability. Accordingly, it helps learn high-level
fine-grained features at a local-group level. Finally, to prevent similar
local-groups from being scattered randomly or far away, GDS brings similar
samples close and thus pulls similar local-groups together, capturing
coarse-grained features at a (semantic) group level. Consequently, Mugs can
capture three granular features that often enjoy higher generality on diverse
downstream tasks over single-granular features, e.g.~instance-level
fine-grained features in contrastive learning. By only pretraining on
ImageNet-1K, Mugs sets new SoTA linear probing accuracy 82.1$\%$ on ImageNet-1K
and improves previous SoTA by $1.1\%$. It also surpasses SoTAs on other tasks,
e.g. transfer learning, detection and segmentation.
Related papers
- GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot
Learning [24.075034737719776]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
We propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection.
Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
arXiv Detail & Related papers (2023-09-02T12:07:21Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning [19.90385022248391]
Task-aware Dual Similarity Network( TDSNet) proposed to explore global invariant features and discriminative local details.
TDSNet achieves competitive performance by comparing with other state-of-the-art algorithms.
arXiv Detail & Related papers (2022-10-22T04:24:55Z) - Semantic-diversity transfer network for generalized zero-shot learning
via inner disagreement based OOD detector [26.89763840782029]
Zero-shot learning (ZSL) aims to recognize objects from unseen classes, where the kernel problem is to transfer knowledge from seen classes to unseen classes.
The knowledge transfer in many existing works is limited mainly due to the facts that 1) the widely used visual features are global ones but not totally consistent with semantic attributes.
We propose a Semantic-diversity transfer Network (SetNet) addressing the first two limitations, where 1) a multiple-attention architecture and a diversity regularizer are proposed to learn multiple local visual features that are more consistent with semantic attributes and 2) a projector ensemble that geometrically takes diverse local features as inputs
arXiv Detail & Related papers (2022-03-17T01:31:27Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.