Transductive Decoupled Variational Inference for Few-Shot Classification
- URL: http://arxiv.org/abs/2208.10559v1
- Date: Mon, 22 Aug 2022 19:27:09 GMT
- Title: Transductive Decoupled Variational Inference for Few-Shot Classification
- Authors: Anuj Singh, Hadi Jamali-Rad
- Abstract summary: Few-shot learning is an endeavour to transcend this capability down to machines.
We propose a novel variational inference network for few-shot classification (coined as TRIDENT)
We exploit information across both query and support images of a few-shot task using a novel built-in attention-based transductive feature extraction module (we call AttFEX)
- Score: 2.538209532048867
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The versatility to learn from a handful of samples is the hallmark of human
intelligence. Few-shot learning is an endeavour to transcend this capability
down to machines. Inspired by the promise and power of probabilistic deep
learning, we propose a novel variational inference network for few-shot
classification (coined as TRIDENT) to decouple the representation of an image
into semantic and label latent variables, and simultaneously infer them in an
intertwined fashion. To induce task-awareness, as part of the inference
mechanics of TRIDENT, we exploit information across both query and support
images of a few-shot task using a novel built-in attention-based transductive
feature extraction module (we call AttFEX). Our extensive experimental results
corroborate the efficacy of TRIDENT and demonstrate that, using the simplest of
backbones, it sets a new state-of-the-art in the most commonly adopted datasets
miniImageNet and tieredImageNet (offering up to 4% and 5% improvements,
respectively), as well as for the recent challenging cross-domain miniImagenet
--> CUB scenario offering a significant margin (up to 20% improvement) beyond
the best existing cross-domain baselines. Code and experimentation can be found
in our GitHub repository: https://github.com/anujinho/trident
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - Diverse Instance Discovery: Vision-Transformer for Instance-Aware
Multi-Label Image Recognition [24.406654146411682]
Vision Transformer (ViT) is the research base for this paper.
Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images.
We propose a weakly supervised object localization-based approach to extract multi-scale local features.
arXiv Detail & Related papers (2022-04-22T14:38:40Z) - Dynamic Relevance Learning for Few-Shot Object Detection [6.550840743803705]
We propose a dynamic relevance learning model, which utilizes the relationship between all support images and Region of Interest (RoI) on the query images to construct a dynamic graph convolutional network (GCN)
The proposed model achieves the best overall performance, which shows its effectiveness of learning more generalized features.
arXiv Detail & Related papers (2021-08-04T18:29:42Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Few-Shot Learning with Part Discovery and Augmentation from Unlabeled
Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes.
Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations.
Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Few-Shot Segmentation Without Meta-Learning: A Good Transductive
Inference Is All You Need? [34.95314059362982]
We show that the way inference is performed in few-shot segmentation tasks has a substantial effect on performances.
We introduce a transductive inference for a given query image, leveraging the statistics of its unlabeled pixels.
We show that our method brings about 5% and 6% improvements over the state-of-the-art, in the 5- and 10-shot scenarios.
arXiv Detail & Related papers (2020-12-11T07:11:19Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.