Retrieval Augmented Classification for Long-Tail Visual Recognition
- URL: http://arxiv.org/abs/2202.11233v1
- Date: Tue, 22 Feb 2022 23:40:51 GMT
- Title: Retrieval Augmented Classification for Long-Tail Visual Recognition
- Authors: Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak
Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton van den Hengel
- Abstract summary: We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module.
RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets.
We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes.
- Score: 143.2716893535358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Retrieval Augmented Classification (RAC), a generic approach to
augmenting standard image classification pipelines with an explicit retrieval
module. RAC consists of a standard base image encoder fused with a parallel
retrieval branch that queries a non-parametric external memory of pre-encoded
images and associated text snippets. We apply RAC to the problem of long-tail
classification and demonstrate a significant improvement over previous
state-of-the-art on Places365-LT and iNaturalist-2018 (14.5% and 6.7%
respectively), despite using only the training datasets themselves as the
external information source. We demonstrate that RAC's retrieval module,
without prompting, learns a high level of accuracy on tail classes. This, in
turn, frees the base encoder to focus on common classes, and improve its
performance thereon. RAC represents an alternative approach to utilizing large,
pretrained models without requiring fine-tuning, as well as a first step
towards more effectively making use of external memory within common computer
vision architectures.
Related papers
- Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Why-So-Deep: Towards Boosting Previously Trained Models for Visual Place
Recognition [12.807343105549409]
We present an intelligent method, MAQBOOL, to amplify the power of pre-trained models for better image recall.
We achieve comparable image retrieval results at a low descriptor dimension (512-D), compared to the high descriptor dimension (4096-D) of state-of-the-art methods.
arXiv Detail & Related papers (2022-01-10T08:39:06Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - Instance-level Image Retrieval using Reranking Transformers [18.304597755595697]
Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image.
We propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images.
RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass.
arXiv Detail & Related papers (2021-03-22T23:58:38Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval [3.6526118822907594]
We show that it is not trivial to pick features responding well to a panel of variations and semantic content.
Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
arXiv Detail & Related papers (2021-03-19T10:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.