FuguReport

The Last Human-Written Paper: Agent-Native Research Artifacts

Authors Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Yuchen You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang
Affiliations Stanford University / University of Michigan / Massachusetts Institute of Technology / Arizona State University / Carnegie Mellon University / University of Washington / University of Illinois Urbana-Champaign / NVIDIA / Orchestra Research / LinkedIn / The Ohio State University / Harvard University / National University of Singapore / Meta / Cornell University / Nanyang Technological University / Yale University / The University of Chicago / Portland State University / The University of Hong Kong / Boston College / Stony Brook University / University of Toronto / New York University
Categories Method / Research Artifact Management / Agent-native research artifact ecosystem, Application / Research Automation / Automated decision capture and review, Tooling / Research Compilation / PDF and repository conversion to artifacts
License CC0 1.0

Abstract Overview

This paper introduces Ara (Agent-Native Research Artifact), a protocol that replaces narrative research papers with a machine-executable package organized into four layers: scientific logic (/logic), executable code (/src), an exploration graph preserving failed and successful research trajectories (/trace), and grounded evidence (/evidence). The authors argue that conventional papers impose a "Storytelling Tax" (discarding failed experiments and branching research processes) and an "Engineering Tax" (omitting execution-critical details such as hyperparameters and configurations). Three supporting mechanisms are presented: a Live Research Manager that captures decisions during researcher–agent coding sessions, an Ara Compiler that converts legacy PDFs and repositories into Ara format, and a three-level ARA Seal review system for machine-verifiable structural, rigor, and reproducibility checks. The protocol is evaluated on knowledge extraction, reproduction, and extension tasks using PaperBench and RE-Bench sources, restricted to the machine learning domain.

Novelty

The primary novelty is reframing the primary research output as an agent-operable, four-layer filesystem artifact with explicit cross-layer bindings linking claims, code, evidence, and research trajectories—including dead ends—rather than a human-oriented narrative. The work additionally couples this protocol with a live capture mechanism for researcher–agent sessions, a compiler for backward-compatible conversion of legacy papers, and a staged machine-verifiable review pipeline (the ARA Seal).

Results

In knowledge extraction (450 questions across 30 targets), agents using Ara achieved 93.7% accuracy versus 72.4% for the PDF-plus-repository baseline, with the largest gains on failure-knowledge questions (+65.7 pp) and configuration-detail recovery (+24.8 pp). In reproduction across 15 papers (150 subtasks, 1,743 rubric requirements), Ara reached a difficulty-weighted success rate of 64.4% versus 57.4% for the baseline, with the advantage widening on harder subtasks (+8.5 pp on hard). In extension on five RE-Bench tasks under Sonnet 4.6, Ara led to earlier useful progress on all five tasks and better final scores on three of five, while the review mutation benchmark showed 100% detection of fabricated claims, rebutted-branch leaks, and over-claims, but only 22% detection of orphan experiments.

Key Points

  1. Ara structures research into four linked layers—scientific logic, executable code, exploration graph (including dead ends), and grounded evidence—connected by cross-layer bindings, to preserve information that narrative papers typically flatten or omit.
  2. The ecosystem includes a Live Research Manager for zero-overhead capture during AI-native development, a Compiler for converting legacy PDFs and repositories, and a three-level ARA Seal review pipeline that automates structural, rigor, and reproducibility verification before human review.
  3. Empirical evaluation on ML papers shows Ara improves agent accuracy on knowledge extraction (+21.3 pp overall), increases difficulty-weighted reproduction success rates (+7.0 pp, growing with task difficulty), and accelerates early-stage extension work, though late-phase reversals on two of five extension tasks suggest the trace's value depends on the gap between documented strategies and the agent's own discovery capacity.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.