Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object
Interactions
- URL: http://arxiv.org/abs/2205.13803v2
- Date: Thu, 13 Apr 2023 07:29:12 GMT
- Title: Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object
Interactions
- Authors: Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun
Zhu, Anima Anandkumar
- Abstract summary: Bongard-HOI is a new visual reasoning benchmark that focuses on compositional learning of human-object interactions from natural images.
It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning.
Bongard-HOI presents a substantial challenge to today's visual recognition models.
- Score: 138.49522643425334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A significant gap remains between today's visual pattern recognition models
and human-level visual cognition especially when it comes to few-shot learning
and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new
visual reasoning benchmark that focuses on compositional learning of
human-object interactions (HOIs) from natural images. It is inspired by two
desirable characteristics from the classical Bongard problems (BPs): 1)
few-shot concept learning, and 2) context-dependent reasoning. We carefully
curate the few-shot instances with hard negatives, where positive and negative
images only disagree on action labels, making mere recognition of object
categories insufficient to complete our benchmarks. We also design multiple
test sets to systematically study the generalization of visual learning models,
where we vary the overlap of the HOI concepts between the training and test
sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a
substantial challenge to today's visual recognition models. The
state-of-the-art HOI detection model achieves only 62% accuracy on few-shot
binary prediction while even amateur human testers on MTurk have 91% accuracy.
With the Bongard-HOI benchmark, we hope to further advance research efforts in
visual reasoning, especially in holistic perception-reasoning systems and
better representation learning.
Related papers
- Towards A Unified Neural Architecture for Visual Recognition and
Reasoning [40.938279131241764]
We propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both.
Our framework enables the investigation of how different visual recognition tasks, datasets, and inductive biases can help enable principledtemporal reasoning capabilities.
arXiv Detail & Related papers (2023-11-10T20:27:43Z) - Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World [57.832261258993526]
Bongard-OpenWorld is a new benchmark for evaluating real-world few-shot reasoning for machine vision.
It already imposes a significant challenge to current few-shot reasoning algorithms.
arXiv Detail & Related papers (2023-10-16T09:19:18Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Object-Centric Diagnosis of Visual Reasoning [118.36750454795428]
This paper presents a systematical object-centric diagnosis of visual reasoning on grounding and robustness.
We develop a diagnostic model, namely Graph Reasoning Machine.
Our model replaces purely symbolic visual representation with probabilistic scene graph and then applies teacher-forcing training for the visual reasoning module.
arXiv Detail & Related papers (2020-12-21T18:59:28Z) - Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and
Reasoning [78.13740873213223]
Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems.
We propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
arXiv Detail & Related papers (2020-10-02T03:19:46Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.