ARC-TGI: Human-Validated Task Generators with Reasoning Chain Templates for ARC-AGI
- URL: http://arxiv.org/abs/2603.05099v1
- Date: Thu, 05 Mar 2026 12:10:51 GMT
- Title: ARC-TGI: Human-Validated Task Generators with Reasoning Chain Templates for ARC-AGI
- Authors: Jens Lehmann, Syeda Khushbakht, Nikoo Salehfard, Nur A Zarin Nishat, Dhananjay Bhandiwad, Andrei Aioanei, Sahar Vahdati,
- Abstract summary: ARC-TGI is an open-source framework for task-family generators to sample diverse ARC-AGI tasks.<n>Each generated task is paired with natural-language input and transformation reasoning chains.<n>All generators undergo human refinement and local verification to keep both grids and reasoning traces natural and consistent under variation.
- Score: 5.539241859666831
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) probes few-shot abstraction and rule induction on small visual grids, but progress is difficult to measure on static collections of hand-authored puzzles due to overfitting, dataset leakage, and memorisation. We introduce ARC-TGI (ARC Task Generators Inventory), an open-source framework for task-family generators: compact Python programs that sample diverse ARC-AGI tasks while preserving a latent rule. ARC-TGI is built around a solver-facing representation: each generated task is paired with natural-language input and transformation reasoning chains and partially evaluated Python code implementing sampling, transformation, and episode construction. Crucially, ARC-TGI supports task-level constraints so that training examples collectively expose the variations needed to infer the underlying rule, a requirement for human-solvable ARC tasks that independent per-example sampling often fails to guarantee. All generators undergo human refinement and local verification to keep both grids and reasoning traces natural and consistent under variation. We release 461 generators covering 180 ARC-Mini tasks, 215 ARC-AGI-1 tasks (200 train, 15 test), and 66 ARC-AGI-2 tasks (55 train, 11 test), enabling scalable dataset sampling and controlled benchmarking.
Related papers
- Tiny Recursive Models on ARC-AGI-1: Inductive Biases, Identity Conditioning, and Test-Time Compute [0.0]
We empirically analyze the ARC Prize TRM checkpoint on ARC-AGI-1.<n>We show that test-time augmentation and majority-vote ensembling account for a substantial fraction of reported performance.<n>We also compare TRM with a naive QLoRA fine-tune of Llama 3 8B on canonical ARC-AGI-1.
arXiv Detail & Related papers (2025-12-04T06:20:44Z) - ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus [3.553493344868413]
This paper introduces ARC-GEN, an open-source procedural generator aimed at extending the original ARC-AGI training dataset.<n>Unlike prior efforts, our generator is both exhaustive (covering all four-hundred tasks) and mimetic.<n>We also discuss the use of this generator in establishing a static benchmark suite to verify the correctness of programs submitted to the 2025 Google Code Golf Championship.
arXiv Detail & Related papers (2025-10-31T18:10:05Z) - GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning [7.09254962218677]
State-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition.<n>We introduce an analogy-inspired ARC dataset, GIFARC.<n>GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search.
arXiv Detail & Related papers (2025-05-27T03:42:51Z) - Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization [64.33914369424494]
RoleRAG is a unified RAG framework that achieves efficient multi-task processing through role-specific token optimization.<n>RoleRAG comprises six modules, each handling a specific sub-task within the RAG process.<n>We introduce a query graph to represent the decomposition of the query, which can be dynamically resolved according to the decomposing state.
arXiv Detail & Related papers (2025-05-21T12:25:12Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - In-Context LoRA for Diffusion Transformers [49.288489286276146]
We show that text-to-image DiTs can effectively perform in-context generation without any tuning.
We name our models In-Context LoRA (IC-LoRA)
Our pipeline generates high-fidelity image sets that better adhere to prompts.
arXiv Detail & Related papers (2024-10-31T09:45:00Z) - Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects [31.926206783846144]
We show that a Vision Transformer (ViT) fails dramatically on most ARC tasks even when trained on one million examples per task.<n>We propose ViTARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities required by the ARC.<n>Our task-specific ViTARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks.
arXiv Detail & Related papers (2024-10-08T22:25:34Z) - LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and
the Importance of Object-based Representations [50.431003245201644]
We show that GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset.
We propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC.
arXiv Detail & Related papers (2023-05-26T16:32:17Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.