Bongards at the Boundary of Perception and Reasoning: Programs or Language?
- URL: http://arxiv.org/abs/2602.03038v1
- Date: Tue, 03 Feb 2026 03:04:27 GMT
- Title: Bongards at the Boundary of Perception and Reasoning: Programs or Language?
- Authors: Cassidy Langenfeld, Claas Beger, Gloria Geng, Wasu Top Piriyakulkij, Keya Hu, Yewen Pu, Kevin Ellis,
- Abstract summary: Humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations.<n>We present a neurosymbolic approach to solving the Bongard problems.<n>We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.
- Score: 18.717928534727864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language Models (VLMs) have made great strides in everyday visual tasks, such as captioning a natural image, or answering commonsense questions about such images. But humans possess the puzzling ability to deploy their visual reasoning abilities in radically new situations, a skill rigorously tested by the classic set of visual reasoning challenges known as the Bongard problems. We present a neurosymbolic approach to solving these problems: given a hypothesized solution rule for a Bongard problem, we leverage LLMs to generate parameterized programmatic representations for the rule and perform parameter fitting using Bayesian optimization. We evaluate our method on classifying Bongard problem images given the ground truth rule, as well as on solving the problems from scratch.
Related papers
- MentisOculi: Revealing the Limits of Reasoning with Mental Imagery [63.285794947638614]
We develop MentisOculi, a suite of multi-step reasoning problems amenable to visual solution.<n> evaluating visual strategies ranging from latent tokens to explicit generated imagery, we find they generally fail to improve performance.<n>Our findings suggest that despite their inherent appeal, visual thoughts do not yet benefit model reasoning.
arXiv Detail & Related papers (2026-02-02T18:49:06Z) - Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs [51.93737995405164]
Large Vision-Language Models (LVLMs) are susceptible to hallucinations.<n>We introduce a novel Conditional Pointwise Mutual Information (C-PMI) calibrated decoding strategy.<n>We show that the proposed method significantly reduces hallucinations in LVLMs while preserving decoding efficiency.
arXiv Detail & Related papers (2025-05-26T08:36:10Z) - A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems [64.05451567422342]
We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language.<n>EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life.<n>We find that state-of-the-art LLMs, across multiple prompting strategies, solve textbook problems more accurately than their real-life and inverted counterparts.
arXiv Detail & Related papers (2025-02-19T14:39:59Z) - Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild [35.91285472401222]
We devise an innovative training and reasoning framework suitable for lightweight Multimodal Large Language Models (MLLMs)<n>Our self-questioning approach organically guides MLLMs to focus on visual clues relevant to the target problem, reducing hallucinations and enhancing the model's ability to describe fine-grained image details.<n>Our experiments on various benchmarks demonstrate SQ's remarkable capabilities in self-questioning, zero-shot visual reasoning and hallucination mitigation.
arXiv Detail & Related papers (2025-01-06T12:16:56Z) - Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling [80.64715784334936]
We study tradeoffs in a classic grounded question-asking task based on the board game Battleship.
Our model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain.
We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance.
arXiv Detail & Related papers (2024-02-29T18:58:15Z) - Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World [57.832261258993526]
Bongard-OpenWorld is a new benchmark for evaluating real-world few-shot reasoning for machine vision.<n>It already imposes a significant challenge to current few-shot reasoning algorithms.
arXiv Detail & Related papers (2023-10-16T09:19:18Z) - Support-Set Context Matters for Bongard Problems [7.996325307599679]
Bongard problems are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images.<n>Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test.<n>We show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO and Bongard-HOI.
arXiv Detail & Related papers (2023-09-07T03:33:49Z) - Using Program Synthesis and Inductive Logic Programming to solve Bongard
Problems [20.864990877667296]
We present a preliminary examination of whether programs constructed by Dreamcoder can be used for analogical reasoning to solve certain Bongard problems.
We decorate the states using positional information in an automated manner and then encode the resulting sequence into logical facts in Prolog.
Experiments on synthetically created Bongard problems for concepts such as 'above/below' and 'clockwise/counterclockwise' demonstrate that our end-to-end system can solve such problems.
arXiv Detail & Related papers (2021-10-19T13:13:06Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.