Related papers: Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI

Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI

URL: http://arxiv.org/abs/2312.10586v1
Date: Sun, 17 Dec 2023 02:18:10 GMT
Title: Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI
Authors: Qinqian Lei, Bo Wang, Robby T. Tan
Abstract summary: We introduce novel label-uncertain query augmentation techniques to enhance the diversity of the query inputs. Our method sets a new state-of-the-art (SOTA) performance by achieving 68.74% accuracy on the Bongard-HOI benchmark. In our evaluation on HICO-FS, our method achieves 73.27% accuracy, outperforming the previous SOTA of 71.20% in the 5-way 5-shot task.
Score: 23.704284537118543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting human-object interactions (HOI) in a few-shot setting remains a challenge. Existing meta-learning methods struggle to extract representative features for classification due to the limited data, while existing few-shot HOI models rely on HOI text labels for classification. Moreover, some query images may display visual similarity to those outside their class, such as similar backgrounds between different HOI classes. This makes learning more challenging, especially with limited samples. Bongard-HOI (Jiang et al. 2022) epitomizes this HOI few-shot problem, making it the benchmark we focus on in this paper. In our proposed method, we introduce novel label-uncertain query augmentation techniques to enhance the diversity of the query inputs, aiming to distinguish the positive HOI class from the negative ones. As these augmented inputs may or may not have the same class label as the original inputs, their class label is unknown. Those belonging to a different class become hard samples due to their visual similarity to the original ones. Additionally, we introduce a novel pseudo-label generation technique that enables a mean teacher model to learn from the augmented label-uncertain inputs. We propose to augment the negative support set for the student model to enrich the semantic information, fostering diversity that challenges and enhances the student's learning. Experimental results demonstrate that our method sets a new state-of-the-art (SOTA) performance by achieving 68.74% accuracy on the Bongard-HOI benchmark, a significant improvement over the existing SOTA of 66.59%. In our evaluation on HICO-FS, a more general few-shot recognition dataset, our method achieves 73.27% accuracy, outperforming the previous SOTA of 71.20% in the 5-way 5-shot task.

Related papers

VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification [23.08368823707528]
We present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs) We introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy. Experimental results showed that our method obtained an accuracy of 87.1% and 95.1% on the HPH and LC25K datasets, respectively.
arXiv Detail & Related papers (2024-03-23T13:24:30Z)
Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. We present a framework for understanding how DA interacts with class-level learning dynamics. We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z)
Few-shot Image Classification based on Gradual Machine Learning [6.935034849731568]
Few-shot image classification aims to accurately classify unlabeled images using only a few labeled samples. We propose a novel approach based on the non-i.i.d paradigm of gradual machine learning (GML) We show that the proposed approach can improve the SOTA performance by 1-5% in terms of accuracy.
arXiv Detail & Related papers (2023-07-28T12:30:41Z)
HomE: Homography-Equivariant Video Representation Learning [62.89516761473129]
We propose a novel method for representation learning of multi-view videos. Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views. On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101 dataset, better than most state-of-the-art self-supervised learning methods.
arXiv Detail & Related papers (2023-06-02T15:37:43Z)
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation [12.362400851574872]
Weakly incremental learning for semantic segmentation (WILSS) is a novel and attractive task. We propose a novel and data-efficient framework for WILSS, named FMWISS.
arXiv Detail & Related papers (2023-02-28T02:21:42Z)
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery [39.03732147384566]
Generalized Novel Category Discovery (GNCD) setting aims to categorize unlabeled training data coming from known and novel classes. We propose Contrastive Affinity Learning method with auxiliary visual Prompts, dubbed PromptCAL, to address this challenging problem. Our approach discovers reliable pairwise sample affinities to learn better semantic clustering of both known and novel classes for the class token and visual prompts.
arXiv Detail & Related papers (2022-12-11T20:06:14Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Bag of Instances Aggregation Boosts Self-supervised Learning [122.61914701794296]
We propose a simple but effective distillation strategy for unsupervised learning. Our method, termed as BINGO, targets at transferring the relationship learned by the teacher to the student. BINGO achieves new state-of-the-art performance on small scale models.
arXiv Detail & Related papers (2021-07-04T17:33:59Z)
Neighborhood Contrastive Learning for Novel Class Discovery [79.14767688903028]
We build a new framework, named Neighborhood Contrastive Learning, to learn discriminative representations that are important to clustering performance. We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-06-20T17:34:55Z)
Few-Shot Learning with Part Discovery and Augmentation from Unlabeled Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes. Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations. Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z)
Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z)
Unsupervised Representation Learning by InvariancePropagation [34.53866045440319]
In this paper, we propose Invariance propagation to focus on learning representations invariant to category-level variations. With a ResNet-50 as the backbone, our method achieves 71.3% top-1 accuracy on ImageNet linear classification and 78.2% top-5 accuracy fine-tuning on only 1% labels. We also achieve state-of-the-art performance on other downstream tasks, including linear classification on Places205 and Pascal VOC, and transfer learning on small scale datasets.
arXiv Detail & Related papers (2020-10-07T13:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.