Related papers: COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

URL: http://arxiv.org/abs/2509.05249v1
Date: Fri, 05 Sep 2025 17:01:05 GMT
Title: COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization
Authors: Yassine Taoudi-Benchekroun, Klim Troyan, Pascal Sager, Stefan Gerber, Lukas Tuggener, Benjamin Grewe,
Abstract summary: COGITAO is a framework to study composition and generalization in visual domains.<n>It constructs rule-based tasks which apply a set of transformations to objects in grid-like environments.<n>It supports composition, at adjustable depth, over a set of 28 transformations, along with extensive control over grid parametrization and object properties.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to compose learned concepts and apply them in novel settings is key to human intelligence, but remains a persistent limitation in state-of-the-art machine learning models. To address this issue, we introduce COGITAO, a modular and extensible data generation framework and benchmark designed to systematically study compositionality and generalization in visual domains. Drawing inspiration from ARC-AGI's problem-setting, COGITAO constructs rule-based tasks which apply a set of transformations to objects in grid-like environments. It supports composition, at adjustable depth, over a set of 28 interoperable transformations, along with extensive control over grid parametrization and object properties. This flexibility enables the creation of millions of unique task rules -- surpassing concurrent datasets by several orders of magnitude -- across a wide range of difficulties, while allowing virtually unlimited sample generation per rule. We provide baseline experiments using state-of-the-art vision models, highlighting their consistent failures to generalize to novel combinations of familiar elements, despite strong in-domain performance. COGITAO is fully open-sourced, including all code and datasets, to support continued research in this field.

Related papers

OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
Domain-Specific Data Generation Framework for RAG Adaptation [58.20906914537952]
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models with external retrieval to enable domain-grounded responses.<n>We propose RAGen, a framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches.
arXiv Detail & Related papers (2025-10-13T09:59:49Z)
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering [50.42577862494645]
We present RAG-IGBench, a benchmark designed to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering.<n>RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content.
arXiv Detail & Related papers (2025-10-11T03:06:39Z)
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems [6.798217798359233]
We propose a novel framework for synthesizing code problems that emulate authentic real-world scenarios.<n>This framework systematically integrates domain knowledge, domain skills, and coding skills, all of which are extracted from real-world programming-related datasets.<n>The proposed method consistently achieves superior performance over state-of-the-art open-source large language models.
arXiv Detail & Related papers (2025-09-16T12:52:48Z)
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation [72.44384066166147]
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains.<n>Existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures.<n>We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch.
arXiv Detail & Related papers (2025-07-24T09:17:41Z)
Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation [7.980497203230983]
We propose an Aggregation Structural Representation (ASR) module that integrates graph networks with large language models (LLMs) to preserve structural information while enhancing generative capability.<n>A comprehensive evaluation on the RICO dataset demonstrates the strong performance of ASR, both quantitatively using mean Intersection over Union (mIoU) and qualitatively through a crowdsourced user study.
arXiv Detail & Related papers (2025-05-26T06:17:21Z)
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving [50.50405233978406]
We propose a fully dynamic multimodal evaluation framework, named Open-ended Visual Puzzle Generation (OVPG)<n>OVPG aims to generate fresh, diverse, and verifiable evaluation data automatically in puzzle-solving tasks.<n>Built upon OVPG, we construct PuzzleBench, a dynamic and scalable benchmark comprising 11,840 VQA samples.
arXiv Detail & Related papers (2025-04-15T05:29:31Z)
Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs [23.357843519762483]
Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation framework, combined with Knowledge Graphs, robustly enhances the reasoning capabilities of Large language models.<n>We introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities.<n>Our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments.
arXiv Detail & Related papers (2024-12-10T15:56:03Z)
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z)
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities. We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.