FuguReport

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Authors Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu
Affiliations Synvo AI / Nanyang Technological University
Categories Method / Agent Personalization / Personalization using file system behavior, Application / File System Agent / Memory-centric file system agents, Evaluation / Open Source Framework / Support for future research
License CC BY 4.0

Abstract Overview

FileGram is a unified framework for personalizing file-system agents using behavioral traces (atomic actions and content deltas) rather than dialogue histories. It comprises three components: FileGramEngine, a persona-conditioned data engine that generates multimodal file-system trajectories across 640 profile–task combinations; FileGramBench, a diagnostic benchmark with 4.6K QA pairs evaluating memory-centric personalization across profile reconstruction, behavioral inference, trace disentanglement, anomaly detection, shift analysis, and multimodal grounding; and FileGramOS, a bottom-up memory architecture that encodes traces into procedural, semantic, and episodic channels while deferring abstraction until query time. The authors argue that existing memory systems are interaction-centric and miss fine-grained operational patterns embedded in file-system operations, and they open-source the framework to support future research.

Novelty

The paper's main novelty is grounding agent personalization in file-system behavioral traces and content deltas rather than conversational summaries. It introduces what the authors describe as the first benchmark focused on memory-centric personalization from longitudinal file-system operations, alongside a bottom-up memory architecture (FileGramOS) that preserves atomic behavioral evidence through procedural, semantic, and episodic channels and defers abstraction until query time.

Results

On FileGramBench, context-based and narrative-first baselines reach 48–50% average accuracy, multimodal memory methods reach up to 44.7%, and FileGramOS attains 59.6%, outperforming the strongest baseline EverMemOS (49.9%) with particular gains from retaining procedural statistics and action-level structure. Anomaly detection proves more tractable than shift attribution, and performance on real-world screen-recording evaluation drops to single-digit accuracy across all methods, exposing a substantial sim-to-real gap.

Key Points

  1. FileGram combines a persona-conditioned data engine (FileGramEngine, 640 trajectories with 20,028 atomic actions), a diagnostic benchmark (FileGramBench, 4.6K QA items across four tracks), and a bottom-up memory system (FileGramOS) to study personalization from file-system behavioral traces.
  2. FileGramBench evaluates profile reconstruction, behavioral inference, trace disentanglement, anomaly detection, shift analysis, and multimodal grounding, with ground truth derived from predefined user profiles across procedural, semantic, and episodic memory channels.
  3. Empirical results show that preserving operational micro-structure is critical for personalization: FileGramOS outperforms all baselines by deferring narrative abstraction, while shift attribution and real-world video grounding remain major open challenges with single-digit accuracy across all methods.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.