Self-Supervised Learning for Fine-Grained Visual Categorization
- URL: http://arxiv.org/abs/2105.08788v1
- Date: Tue, 18 May 2021 19:16:05 GMT
- Title: Self-Supervised Learning for Fine-Grained Visual Categorization
- Authors: Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam
- Abstract summary: We study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC)
FGVC aims to distinguish objects of visually similar sub categories within a general category.
Our baseline achieves $86.36%$ top-1 classification accuracy on CUB-200-2011 dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research in self-supervised learning (SSL) has shown its capability in
learning useful semantic representations from images for classification tasks.
Through our work, we study the usefulness of SSL for Fine-Grained Visual
Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub
categories within a general category. The small inter-class, but large
intra-class variations within the dataset makes it a challenging task. The
limited availability of annotated labels for such a fine-grained data
encourages the need for SSL, where additional supervision can boost learning
without the cost of extra annotations. Our baseline achieves $86.36\%$ top-1
classification accuracy on CUB-200-2011 dataset by utilizing random crop
augmentation during training and center crop augmentation during testing. In
this work, we explore the usefulness of various pretext tasks, specifically,
rotation, pretext invariant representation learning (PIRL), and deconstruction
and construction learning (DCL) for FGVC. Rotation as an auxiliary task
promotes the model to learn global features, and diverts it from focusing on
the subtle details. PIRL that uses jigsaw patches attempts to focus on
discriminative local regions, but struggles to accurately localize them. DCL
helps in learning local discriminating features and outperforms the baseline by
achieving $87.41\%$ top-1 accuracy. The deconstruction learning forces the
model to focus on local object parts, while reconstruction learning helps in
learning the correlation between the parts. We perform extensive experiments to
reason our findings. Our code is available at
https://github.com/mmaaz60/ssl_for_fgvc.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Segment Anything Model is a Good Teacher for Local Feature Learning [19.66262816561457]
Local feature detection and description play an important role in many computer vision tasks.
Data-driven local feature learning methods need to rely on pixel-level correspondence for training.
We propose SAMFeat to introduce SAM as a teacher to guide local feature learning.
arXiv Detail & Related papers (2023-09-29T05:29:20Z) - FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly
Supervised Object Localization [10.08410402383604]
This work addresses the task of class-incremental weakly supervised object localization (CI-WSOL)
The goal is to incrementally learn object localization for novel classes using only image-level annotations while retaining the ability to localize previously learned classes.
We first present a strong baseline method for CI-WSOL by adapting the strategies of class-incremental classifiers to catastrophic forgetting.
We then propose the feature drift compensation network to compensate for the effects of feature drifts on class scores and localization maps.
arXiv Detail & Related papers (2023-09-17T01:10:45Z) - Deep Active Learning Using Barlow Twins [0.0]
The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images.
The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool.
We propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets.
arXiv Detail & Related papers (2022-12-30T12:39:55Z) - Self-Supervised Learning for Fine-Grained Image Classification [0.0]
Fine-grained datasets usually provide bounding box annotations along with class labels to aid the process of classification.
On the other hand, self-supervised learning exploits the freely available data to generate supervisory signals which act as labels.
Our idea is to leverage self-supervision such that the model learns useful representations of fine-grained image classes.
arXiv Detail & Related papers (2021-07-29T14:01:31Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.