Fugu-MT 論文翻訳(概要): How to Correctly Make Mistakes: A Framework for Constructing and Benchmarking Mistake Aware Egocentric Procedural Videos

論文の概要: How to Correctly Make Mistakes: A Framework for Constructing and Benchmarking Mistake Aware Egocentric Procedural Videos

arxiv url: http://arxiv.org/abs/2604.15134v1
Date: Thu, 16 Apr 2026 15:14:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.97705
Title: How to Correctly Make Mistakes: A Framework for Constructing and Benchmarking Mistake Aware Egocentric Procedural Videos
Title（参考訳）: 間違いを正す方法:エゴセントリックな手続き的ビデオに注意してミスを検知し、ベンチマークするフレームワーク
Authors: Olga Loginova, Frank Keller,
Abstract要約: 本稿では,自己中心型手続き型ビデオの構築とベンチマークを行うフレームワークであるPIE-Vについて述べる。ベンチマークには、ステップレベルとプロシージャレベルの品質をカバーする9つの指標を備えた統合分類法と人間のルーリックを導入する。
参考スコア（独自算出の注目度）: 18.88275830423089
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable procedural monitoring in video requires exposure to naturally occurring human errors and the recoveries that follow. In egocentric recordings, mistakes are often partially occluded by hands and revealed through subtle object state changes, while existing procedural datasets provide limited and inconsistent mistake and correction traces. We present PIE-V (Psychologically Inspired Error injection for Videos), a framework for constructing and benchmarking mistake-aware egocentric procedural videos by augmenting clean keystep procedures with controlled, human-plausible deviations. PIE-V combines a psychology-informed error planner conditioned on procedure phase and semantic step load, a correction planner that models recovery behavior, an LLM writer that performs cascade-consistent rewrites, and an LLM judge that validates procedural coherence and repairs failures. For video segment edits, PIE-V synthesizes replacement clips with text-guided video generation and stitches them into the episode to preserve visual plausibility. Applied to 17 tasks and 50 Ego-Exo4D scenarios, PIE-V injects 102 mistakes and generates 27 recovery corrections. For benchmarking, we introduce a unified taxonomy and a human rubric with nine metrics that cover step-level and procedure-level quality, including plausibility, procedure logic with annotator confidence, state change coherence, and grounding between text and video. Using this protocol, we audit several existing resources and compare PIE-V against a freeform LLM generation baseline under the same criteria. Together, the framework and rubric support post-completion verification for egocentric procedural mistake detection and correction.
Abstract（参考訳）: ビデオにおける信頼性の高い手続き的監視には、自然に発生する人間のエラーと、それに続く回復に曝露する必要がある。エゴセントリックな記録では、ミスは手によって部分的に隠され、微妙な物体の状態変化によって明らかにされるが、既存の手続き的データセットは限定的で矛盾のない誤りと修正の痕跡を提供する。 PIE-V (Psychologically Inspired Error Injection for Videos) は,人間の目視で操作可能な鮮明なキーステップを付加することにより,誤りを意識したエゴセントリックなプロシージャビデオの構築とベンチマークを行うためのフレームワークである。 PIE-Vは、プロシージャフェーズとセマンティックステップロードに条件付けされた心理学的インフォームドエラープランナー、回復動作をモデル化する修正プランナー、カスケード一貫性のある書き直しを行うLLMライター、手続き的コヒーレンスを検証し失敗を修復するLLMジャッジを組み合わせる。ビデオセグメント編集のために、PIE-Vはテキスト誘導ビデオ生成で置換クリップを合成し、それらをエピソードに縫い付け、視覚的可視性を維持する。 17のタスクと50のEgo-Exo4Dシナリオに適用されたPIE-Vは102のミスを注入し、27のリカバリ修正を生成する。ベンチマークでは,ステップレベルとプロシージャレベルの9つの指標を備えた統一分類法と人間のルーリックを導入し,可視性,アノテータの信頼性を備えた手順論理,状態変化の一貫性,テキストとビデオ間の接地などについて検討した。このプロトコルを用いて、既存のリソースを監査し、PIE-V とフリーフォーム LLM 生成ベースラインを同じ基準で比較する。フレームワークとルーブリックは共に、自己中心型手続き的誤り検出と修正のための補完後検証をサポートする。

論文の概要: How to Correctly Make Mistakes: A Framework for Constructing and Benchmarking Mistake Aware Egocentric Procedural Videos

関連論文リスト