Fugu-MT 論文翻訳(概要): Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

論文の概要: Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

arxiv url: http://arxiv.org/abs/2603.09513v2
Date: Wed, 18 Mar 2026 07:21:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:42.154614
Title: Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks
Title（参考訳）: 短軸超越:非マルコフシミュレーションベンチマークにおけるロバスト長軸操作のためのVQメモリ
Authors: Honghui Wang, Zhi Jing, Jicong Ao, Shiji Song, Xuelong Li, Gao Huang, Chenjia Bai,
Abstract要約: RuleSafeは、スケーラブルなLLM支援シミュレーションフレームワーク上に構築された、新しいオペレーティングベンチマークである。 VQ-Memoryはベクトル量子化変分オートエンコーダを用いたコンパクトで構造化された時間表現である。
参考スコア（独自算出の注目度）: 96.60530830276281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The high cost of collecting real-robot data has made robotic simulation a scalable platform for both evaluation and data generation. Yet most existing benchmarks concentrate on simple manipulation tasks such as pick-and-place, failing to capture the non-Markovian characteristics of real-world tasks and the complexity of articulated object interactions. To address this limitation, we present RuleSafe, a new articulated manipulation benchmark built upon a scalable LLM-aided simulation framework. RuleSafe features safes with diverse unlocking mechanisms, such as key locks, password locks, and logic locks, which require different multi-stage reasoning and manipulation strategies. These LLM-generated rules produce non-Markovian and long-horizon tasks that require temporal modeling and memory-based reasoning. We further propose VQ-Memory, a compact and structured temporal representation that uses vector-quantized variational autoencoders (VQ-VAEs) to encode past proprioceptive states into discrete latent tokens. This representation filters low-level noise while preserving high-level task-phase context, providing lightweight yet robust temporal cues that are compatible with existing Vision-Language-Action models (VLA). Extensive experiments on state-of-the-art VLA models and diffusion policies show that VQ-Memory consistently improves long-horizon planning, enhances generalization to unseen configurations, and enables more efficient manipulation with reduced computational cost. Project page: vqmemory.github.io
Abstract（参考訳）: リアルロボットのデータ収集のコストが高いため、ロボットシミュレーションは評価とデータ生成の両方にスケーラブルなプラットフォームとなっている。しかし、既存のベンチマークのほとんどは、ピック・アンド・プレイスのような単純な操作タスクに集中しており、実世界のタスクのマルコフ的でない特徴を捉えていない。この制限に対処するために、スケーラブルなLCM支援シミュレーションフレームワーク上に構築された新しい調音式操作ベンチマークであるRe RuleSafeを提案する。 RuleSafeは、キーロック、パスワードロック、ロジックロックなど、さまざまなアンロック機構を備えたセーフで、さまざまなマルチステージ推論と操作戦略を必要とする。これらのLCM生成規則は、時間的モデリングとメモリベースの推論を必要とする非マルコフ的および長期水平的タスクを生成する。さらに、ベクトル量子化変分オートエンコーダ(VQ-VAE)を用いて、過去の受容状態を離散潜在トークンに符号化する、コンパクトで構造化された時間表現であるVQ-Memoryを提案する。この表現は、高レベルのタスクフェーズコンテキストを維持しながら低レベルのノイズをフィルタリングし、既存のビジョン・ランゲージ・アクション・モデル(VLA)と互換性のある軽量で堅牢な時間的手がかりを提供する。最先端のVLAモデルと拡散ポリシーに関する広範な実験により、VQ-Memoryは長い水平計画を一貫して改善し、予期せぬ構成への一般化を強化し、計算コストの削減によるより効率的な操作を可能にした。プロジェクトページ: vqMemory.github.io

論文の概要: Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

関連論文リスト