Related papers: ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus

ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus

URL: http://arxiv.org/abs/2511.00162v2
Date: Tue, 04 Nov 2025 03:46:39 GMT
Title: ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus
Authors: Michael D. Moffitt,
Abstract summary: This paper introduces ARC-GEN, an open-source procedural generator aimed at extending the original ARC-AGI training dataset.<n>Unlike prior efforts, our generator is both exhaustive (covering all four-hundred tasks) and mimetic.<n>We also discuss the use of this generator in establishing a static benchmark suite to verify the correctness of programs submitted to the 2025 Google Code Golf Championship.
Score: 3.553493344868413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Abstraction and Reasoning Corpus remains one of the most compelling and challenging benchmarks for tracking progress toward achieving Artificial General Intelligence. In contrast to other evaluation datasets designed to assess an agent's task-specific skills or accumulated knowledge, the ARC-AGI suite is specifically targeted at measuring skill acquisition efficiency, a trait that has (so far) been lacking in even the most sophisticated machine learning systems. For algorithms that require extensive intra-task exemplars, a significant constraint imposed by ARC-AGI is the modest cardinality of its demonstration set, comprising a small number of $\langle$ input, output $\rangle$ grids per task specifying the corresponding transformation. To embellish the space of viable sample pairs, this paper introduces ARC-GEN, an open-source procedural generator aimed at extending the original ARC-AGI training dataset as faithfully as possible. Unlike prior efforts, our generator is both exhaustive (covering all four-hundred tasks) and mimetic (more closely honoring the distributional properties and characteristics embodied in the initial ARC-AGI-1 release). We also discuss the use of this generator in establishing a static benchmark suite to verify the correctness of programs submitted to the 2025 Google Code Golf Championship.

Related papers

ARC-TGI: Human-Validated Task Generators with Reasoning Chain Templates for ARC-AGI [5.539241859666831]
ARC-TGI is an open-source framework for task-family generators to sample diverse ARC-AGI tasks.<n>Each generated task is paired with natural-language input and transformation reasoning chains.<n>All generators undergo human refinement and local verification to keep both grids and reasoning traces natural and consistent under variation.
arXiv Detail & Related papers (2026-03-05T12:10:51Z)
GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction [51.83437071408662]
We propose GLOW, a unified framework for AW performance prediction.<n>GLOW combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs.<n>Experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
arXiv Detail & Related papers (2025-12-11T13:30:46Z)
The Geometry of Benchmarks: A New Path Toward AGI [0.0]
We introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space.<n>First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance.<n>Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences.<n>Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning
arXiv Detail & Related papers (2025-12-03T21:34:09Z)
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z)
Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z)
First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection [14.070196423996045]
Existing approaches often rely on heavy training and large computational resources.<n>We propose RAG-SEG, a training-free paradigm that decouples COD into two stages: Retrieval-Augmented Generation (RAG) for generating coarse masks as prompts, followed by SAM-based segmentation (SEG) for refinement.<n>RAG-SEG constructs a compact retrieval database via unsupervised clustering, enabling fast and effective feature retrieval.<n>Experiments on benchmark COD datasets demonstrate that RAG-SEG performs on par with or surpasses state-of-the-art methods.
arXiv Detail & Related papers (2025-08-21T07:14:18Z)
Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking [5.58730646214246]
ARCE (augmented RoBERTa with contextualized elucidations) is a novel approach that systematically explores and optimize this generation process.<n>ARCE establishes a new state-of-the-art on a benchmark AEC dataset, achieving a Macro-F1 score of 77.20%.<n>This result also reveals a key finding: simple, explanation-based knowledge proves surprisingly more effective than complex, role-based rationales for this task.
arXiv Detail & Related papers (2025-08-10T10:49:48Z)
Towards Learning Abductive Reasoning using VSA Distributed Representations [56.31867341825068]
We introduce the Abductive Rule Learner with Context-awareness (ARLC) model. ARLC features a novel and more broadly applicable training objective for abductive reasoning. We show ARLC's robustness to post-programming training by incrementally learning from examples on top of programmed knowledge.
arXiv Detail & Related papers (2024-06-27T12:05:55Z)
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay [12.499776923362461]
We introduce a novel and scalable method for language model self-improvement called Code It (CodeIt) CodeIt iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization.
arXiv Detail & Related papers (2024-02-07T13:55:27Z)
ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design [52.57999109204569]
ArchGym is an open-source framework that connects diverse search algorithms to architecture simulators. We evaluate ArchGym across multiple vanilla and domain-specific search algorithms in designing custom memory controller, deep neural network accelerators, and custom SOC for AR/VR workloads.
arXiv Detail & Related papers (2023-06-15T06:41:23Z)
Graphs, Constraints, and Search for the Abstraction and Reasoning Corpus [19.27379168184259]
The Abstraction and Reasoning Corpus (ARC) aims at benchmarking the performance of general artificial intelligence algorithms. The ARC's focus on broad generalization and few-shot learning has made it impossible to solve using pure machine learning. We propose Abstract Reasoning with Graph Abstractions (ARGA), a new object-centric framework that first represents images using graphs and then performs a search for a correct program.
arXiv Detail & Related papers (2022-10-18T14:13:43Z)
Anchor-free Oriented Proposal Generator for Object Detection [59.54125119453818]
Oriented object detection is a practical and challenging task in remote sensing image interpretation. Nowadays, oriented detectors mostly use horizontal boxes as intermedium to derive oriented boxes from them. We propose a novel Anchor-free Oriented Proposal Generator (AOPG) that abandons the horizontal boxes-related operations from the network architecture.
arXiv Detail & Related papers (2021-10-05T10:45:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.