FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Abstract Overview
FileGram is a unified framework for personalizing file-system agents using behavioral traces (atomic actions and content deltas) rather than dialogue histories. It comprises three components: FileGramEngine, a persona-conditioned data engine that generates multimodal file-system trajectories across 640 profile–task combinations; FileGramBench, a diagnostic benchmark with 4.6K QA pairs evaluating memory-centric personalization across profile reconstruction, behavioral inference, trace disentanglement, anomaly detection, shift analysis, and multimodal grounding; and FileGramOS, a bottom-up memory architecture that encodes traces into procedural, semantic, and episodic channels while deferring abstraction until query time. The authors argue that existing memory systems are interaction-centric and miss fine-grained operational patterns embedded in file-system operations, and they open-source the framework to support future research.
Novelty
The paper's main novelty is grounding agent personalization in file-system behavioral traces and content deltas rather than conversational summaries. It introduces what the authors describe as the first benchmark focused on memory-centric personalization from longitudinal file-system operations, alongside a bottom-up memory architecture (FileGramOS) that preserves atomic behavioral evidence through procedural, semantic, and episodic channels and defers abstraction until query time.
Results
On FileGramBench, context-based and narrative-first baselines reach 48–50% average accuracy, multimodal memory methods reach up to 44.7%, and FileGramOS attains 59.6%, outperforming the strongest baseline EverMemOS (49.9%) with particular gains from retaining procedural statistics and action-level structure. Anomaly detection proves more tractable than shift attribution, and performance on real-world screen-recording evaluation drops to single-digit accuracy across all methods, exposing a substantial sim-to-real gap.
Key Points
- FileGram combines a persona-conditioned data engine (FileGramEngine, 640 trajectories with 20,028 atomic actions), a diagnostic benchmark (FileGramBench, 4.6K QA items across four tracks), and a bottom-up memory system (FileGramOS) to study personalization from file-system behavioral traces.
- FileGramBench evaluates profile reconstruction, behavioral inference, trace disentanglement, anomaly detection, shift analysis, and multimodal grounding, with ground truth derived from predefined user profiles across procedural, semantic, and episodic memory channels.
- Empirical results show that preserving operational micro-structure is critical for personalization: FileGramOS outperforms all baselines by deferring narrative abstraction, while shift attribution and real-world video grounding remain major open challenges with single-digit accuracy across all methods.
References
- arXiv: https://arxiv.org/abs/2604.04901v1
- Fugu-MT: https://fugumt.com/fugumt/paper_check/2604.04901v1
- Hugging Face Papers: https://huggingface.co/papers/2604.04901
- GitHub: https://github.com/Synvo-ai/FileGram
- Project: https://filegram.choiszt.com