ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving
- URL: http://arxiv.org/abs/2511.11079v2
- Date: Mon, 17 Nov 2025 03:11:14 GMT
- Title: ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving
- Authors: Sejin Kim, Hayan Choi, Seokki Lee, Sundong Kim,
- Abstract summary: We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks.<n>ARCTraj addresses a gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs.<n>It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning.
- Score: 5.7688835899861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input--output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs, revealing intermediate reasoning steps that conventional datasets overlook. Collected via the O2ARC web interface, it contains around 10,000 trajectories annotated with task identifiers, timestamps, and success labels across 400 training tasks from the ARC-AGI-1 benchmark. It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning, enabling integration with reinforcement learning, generative modeling, and sequence modeling methods such as PPO, World Models, GFlowNets, Diffusion agents, and Decision Transformers. Analyses of spatial selection, color attribution, and strategic convergence highlight the structure and diversity of human reasoning. Together, these contributions position ARCTraj as a structured and interpretable foundation for studying human-like reasoning, advancing explainability, alignment, and generalizable intelligence.
Related papers
- OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents [68.85365034738534]
We introduce a unified framework for developing tool-augmented geospatial agents trained on satellite imagery, natural-language queries, and detailed reasoning traces.<n>The training pipeline relies on supervised fine-tuning over structured reasoning trajectories, aligning the model with verified multistep tool interactions.<n>The accompanying corpus comprises 14,538 training and 1,169 evaluation instances, with more than 100K reasoning steps in the training split and over 7K reasoning steps in the evaluation split.
arXiv Detail & Related papers (2026-02-19T18:59:54Z) - Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection [59.04089915447622]
ForenAgent is an interactive IFD framework that enables MLLMs to autonomously generate, execute, and refine Python-based low-level tools around the detection objective.<n>Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication.<n>Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks.
arXiv Detail & Related papers (2025-12-18T08:38:44Z) - GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning [7.09254962218677]
State-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition.<n>We introduce an analogy-inspired ARC dataset, GIFARC.<n>GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search.
arXiv Detail & Related papers (2025-05-27T03:42:51Z) - Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects [31.926206783846144]
We show that a Vision Transformer (ViT) fails dramatically on most ARC tasks even when trained on one million examples per task.<n>We propose ViTARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities required by the ARC.<n>Our task-specific ViTARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks.
arXiv Detail & Related papers (2024-10-08T22:25:34Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Representation-Centric Survey of Skeletal Action Recognition and the ANUBIS Benchmark [43.00059447663327]
3D skeleton-based human action recognition has emerged as a powerful alternative to traditional RGB and depth-based approaches.<n>Despite remarkable progress, current research remains fragmented across diverse input representations.<n>ANUBIS is a large-scale, challenging skeleton action dataset designed to address critical gaps in existing benchmarks.
arXiv Detail & Related papers (2022-05-04T14:03:43Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.