Fugu-MT 論文翻訳(概要): Why Latent Actions Fail, and How to Prevent It

論文の概要: Why Latent Actions Fail, and How to Prevent It

arxiv url: http://arxiv.org/abs/2605.20223v1
Date: Wed, 13 May 2026 09:54:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.216577
Title: Why Latent Actions Fail, and How to Prevent It
Title（参考訳）: 潜伏行動が失敗する理由と予防方法
Authors: Jung Min Lee, Taehyun Cho, Li Zhao, Jungwoo Lee,
Abstract要約: ラテントアクションモデル(LAM)は、フレーム・ツー・フレームの変更を圧縮することで、ラベルのないビデオからアクションライクな表現を学習することを目的としている。ワイヤード・ビデオのフレームは、エージェント自身の状態だけでなく、バックグラウンド・クラッタのような内在的状態も含んでいる。
参考スコア（独自算出の注目度）: 11.214606199787239
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Latent action models (LAMs) aim to learn action-like representations from unlabeled videos by compressing frame-to-frame changes. The frames of in-the-wild videos, however, contain not only the agent's own state but exogenous state such as background clutter. Since the exogenous state introduces changes unrelated to actions, it hinders reliable latent action learning. This paper investigates this problem analytically by extending a linear LAM framework to explicitly model exogenous state. Our analysis reveals two insights: (1) minimizing the standard reconstruction objective produces latent actions that encode exogenous information from future observation; and (2) learning in a representation space that focuses on endogenous components is a key to mitigating the interference of noise. We further show that previously proposed auxiliary objectives, such as action-supervision, provably encourage latent actions to be consistent across exogenous states. These findings are validated through experiments on both linear and nonlinear LAMs, providing a unified theoretical analysis of how exogenous state hinders latent action learning and why common remedies work.
Abstract（参考訳）: ラテントアクションモデル(LAM)は、フレーム・ツー・フレームの変更を圧縮することで、ラベルのないビデオからアクションライクな表現を学習することを目的としている。しかし、ワイヤード・ビデオのフレームには、エージェント自身の状態だけでなく、バックグラウンド・クラッタのような外生的な状態も含まれている。外因性状態は行動とは無関係な変化をもたらすため、信頼できる潜在的行動学習を妨げる。本稿では,線形 LAM フレームワークを外因性状態を明示的にモデル化するために拡張することにより,この問題を解析的に検討する。分析の結果,(1)標準再建目標の最小化は,将来の観測から外因性情報をエンコードする潜時行動を生成すること,(2)内因性成分に着目した表現空間での学習は,ノイズの干渉を緩和する鍵となる。さらに,従来提案されていた補助的目的,例えばアクション・スーパービジョンは,潜在的行動が外因性国家間で一貫していることを確実に促進していることを示す。これらの知見は線形LAMと非線形LAMの両方の実験を通じて検証され、外因性状態が潜在行動学習を妨げているか、そしてなぜ一般的な治療法が機能するのかに関する統一的な理論的分析を提供する。

論文の概要: Why Latent Actions Fail, and How to Prevent It

関連論文リスト