Exploring Long Tail Visual Relationship Recognition with Large
Vocabulary
- URL: http://arxiv.org/abs/2004.00436v7
- Date: Sat, 25 Sep 2021 04:13:23 GMT
- Title: Exploring Long Tail Visual Relationship Recognition with Large
Vocabulary
- Authors: Sherif Abdelkarim, Aniket Agarwal, Panos Achlioptas, Jun Chen, Jiaji
Huang, Boyang Li, Kenneth Church, Mohamed Elhoseiny
- Abstract summary: We make the first large-scale study concerning the task of Long-Tail Visual Relationship Recognition (LTVRR)
LTVRR aims at improving the learning of structured visual relationships that come from the long-tail.
We introduce two LTVRR-related benchmarks, dubbed VG8K-LT and GQA-LT, built upon the widely used Visual Genome and GQA datasets.
- Score: 40.51076584921913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several approaches have been proposed in recent literature to alleviate the
long-tail problem, mainly in object classification tasks. In this paper, we
make the first large-scale study concerning the task of Long-Tail Visual
Relationship Recognition (LTVRR). LTVRR aims at improving the learning of
structured visual relationships that come from the long-tail (e.g., "rabbit
grazing on grass"). In this setup, the subject, relation, and object classes
each follow a long-tail distribution. To begin our study and make a future
benchmark for the community, we introduce two LTVRR-related benchmarks, dubbed
VG8K-LT and GQA-LT, built upon the widely used Visual Genome and GQA datasets.
We use these benchmarks to study the performance of several state-of-the-art
long-tail models on the LTVRR setup. Lastly, we propose a visiolinguistic
hubless (VilHub) loss and a Mixup augmentation technique adapted to LTVRR
setup, dubbed as RelMix. Both VilHub and RelMix can be easily integrated on top
of existing models and despite being simple, our results show that they can
remarkably improve the performance, especially on tail classes. Benchmarks,
code, and models have been made available at:
https://github.com/Vision-CAIR/LTVRR.
Related papers
- Large Margin Prototypical Network for Few-shot Relation Classification with Fine-grained Features [30.11073476165794]
Relation classification (RC) plays a pivotal role in both natural language understanding and knowledge graph completion.
Conventional approaches on RC, regardless of feature engineering or deep learning based, can obtain promising performance on categorizing common types of relation.
In this paper, we consider few-shot learning is of great practical significance to RC and thus improve a modern framework of metric learning for few-shot RC.
arXiv Detail & Related papers (2024-09-06T03:28:38Z) - LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content [17.022005679738733]
Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories.
We propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content.
arXiv Detail & Related papers (2024-03-09T09:52:15Z) - The All-Seeing Project V2: Towards General Relation Comprehension of the Open World [58.40101895719467]
We present the All-Seeing Project V2, a new model and dataset designed for understanding object relations in images.
We propose the All-Seeing Model V2 that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation task.
Our model excels not only in perceiving and recognizing all objects within the image but also in grasping the intricate relation graph between them.
arXiv Detail & Related papers (2024-02-29T18:59:17Z) - Rethink Long-tailed Recognition with Vision Transformers [18.73285611631722]
Vision Transformers (ViT) are hard to train with long-tailed data.
ViT learns generalized features in an unsupervised manner.
Predictive Distribution (PDC) is a novel metric for Long-Tailed Recognition.
arXiv Detail & Related papers (2023-02-28T03:36:48Z) - Improving Tail-Class Representation with Centroid Contrastive Learning [145.73991900239017]
We propose interpolative centroid contrastive learning (ICCL) to improve long-tailed representation learning.
ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the ICCL can be used to retrieve the centroids for both source classes.
Our result shows a significant accuracy gain of 2.8% on the iNaturalist 2018 dataset with a real-world long-tailed distribution.
arXiv Detail & Related papers (2021-10-19T15:24:48Z) - Learning of Visual Relations: The Devil is in the Tails [59.737494875502215]
Visual relation learning is a long-tailed problem, due to the nature of joint reasoning about groups of objects.
In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails.
Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions.
arXiv Detail & Related papers (2021-08-22T08:59:35Z) - RelTransformer: Balancing the Visual Relationship Detection from Local
Context, Scene and Memory [24.085223165006212]
We propose a novel framework, dubbed as RelTransformer, which performs relationship prediction using rich semantic features from multiple image levels.
Our model significantly improves the accuracy of GQA-LT by 27.4% upon the best baselines on tail-relationship prediction.
arXiv Detail & Related papers (2021-04-24T12:04:04Z) - ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space.
We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively.
We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.