Support-Set Context Matters for Bongard Problems
- URL: http://arxiv.org/abs/2309.03468v2
- Date: Sun, 01 Dec 2024 00:09:44 GMT
- Title: Support-Set Context Matters for Bongard Problems
- Authors: Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas,
- Abstract summary: Bongard problems are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images.<n>Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test.<n>We show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO and Bongard-HOI.
- Score: 7.996325307599679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, most existing methods have reached at best 69% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not adapt image features given information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because the "key concept" in a typical Bongard problem can often only be distinguished using multiple positives and multiple negatives. We explore simple methods to incorporate this context and show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to methods with equivalent vision backbone architectures and strong performance on the original Bongard problem set (60.8%).
Related papers
- Bongards at the Boundary of Perception and Reasoning: Programs or Language? [18.717928534727864]
Humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations.<n>We present a neurosymbolic approach to solving the Bongard problems.<n>We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.
arXiv Detail & Related papers (2026-02-03T03:04:27Z) - FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment [4.135467749401761]
We propose a novel certified defense method for Image Quality Assessment (IQA) models.<n>It is based on randomized smoothing with noise applied in the feature space rather than the input space.<n>Our results demonstrate consistent improvements in correlation with subjective quality scores by up to 30.9%.
arXiv Detail & Related papers (2025-08-07T15:47:55Z) - Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI [23.704284537118543]
We introduce novel label-uncertain query augmentation techniques to enhance the diversity of the query inputs.
Our method sets a new state-of-the-art (SOTA) performance by achieving 68.74% accuracy on the Bongard-HOI benchmark.
In our evaluation on HICO-FS, our method achieves 73.27% accuracy, outperforming the previous SOTA of 71.20% in the 5-way 5-shot task.
arXiv Detail & Related papers (2023-12-17T02:18:10Z) - Improved Visual Grounding through Self-Consistent Explanations [58.51131933246332]
We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model.
SelfEQ is a weakly-supervised strategy on visual explanation maps for paraphrases that encourages self-consistency.
arXiv Detail & Related papers (2023-12-07T18:59:22Z) - Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World [57.832261258993526]
Bongard-OpenWorld is a new benchmark for evaluating real-world few-shot reasoning for machine vision.
It already imposes a significant challenge to current few-shot reasoning algorithms.
arXiv Detail & Related papers (2023-10-16T09:19:18Z) - Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval [27.751399400911932]
We introduce an attribute-guided multi-level attention network (AG-MAN) for fine-grained fashion retrieval.
Specifically, we first enhance the pre-trained feature extractor to capture multi-level image embedding.
Then, we propose a classification scheme where images with the same attribute, albeit with different values, are categorized into the same class.
arXiv Detail & Related papers (2022-12-27T05:28:38Z) - Cross-Modal Contrastive Learning for Robust Reasoning in VQA [76.1596796687494]
Multi-modal reasoning in visual question answering (VQA) has witnessed rapid progress recently.
Most reasoning models heavily rely on shortcuts learned from training data.
We propose a simple but effective cross-modal contrastive learning strategy to get rid of the shortcut reasoning.
arXiv Detail & Related papers (2022-11-21T05:32:24Z) - CobNet: Cross Attention on Object and Background for Few-Shot
Segmentation [0.0]
Few-shot segmentation aims to segment images containing objects from previously unseen classes using only a few annotated samples.
Background information can also be useful to distinguish objects from their surroundings.
We propose CobNet which utilises information about the background that is extracted from the query images without annotations of those images.
arXiv Detail & Related papers (2022-10-21T13:49:46Z) - "John is 50 years old, can his son be 65?" Evaluating NLP Models'
Understanding of Feasibility [19.47954905054217]
This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible.
We show that even state-of-the-art models such as GPT-3 struggle to answer the feasibility questions correctly.
arXiv Detail & Related papers (2022-10-14T02:46:06Z) - Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object
Interactions [138.49522643425334]
Bongard-HOI is a new visual reasoning benchmark that focuses on compositional learning of human-object interactions from natural images.
It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning.
Bongard-HOI presents a substantial challenge to today's visual recognition models.
arXiv Detail & Related papers (2022-05-27T07:36:29Z) - Learning Compositional Representation for Few-shot Visual Question
Answering [93.4061107793983]
Current methods of Visual Question Answering perform well on the answers with an amount of training data but have limited accuracy on the novel ones with few examples.
We propose to extract the attributes from the answers with enough data, which are later composed to constrain the learning of the few-shot ones.
Experimental results on the VQA v2.0 validation dataset demonstrate the effectiveness of our proposed attribute network.
arXiv Detail & Related papers (2021-02-21T10:16:24Z) - Overcoming Language Priors with Self-supervised Learning for Visual
Question Answering [62.88124382512111]
Most Visual Question Answering (VQA) models suffer from the language prior problem.
We introduce a self-supervised learning framework to solve this problem.
Our method can significantly outperform the state-of-the-art.
arXiv Detail & Related papers (2020-12-17T12:30:12Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.