Fugu-MT 論文翻訳(概要): IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

論文の概要: IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

arxiv url: http://arxiv.org/abs/2604.20136v1
Date: Wed, 22 Apr 2026 03:03:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.940132
Title: IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory
Title（参考訳）: IMPACT-CYCLE:ロングビデオセマンティックメモリのクレームレベルスーパーバイザ補正のための契約型マルチエージェントシステム
Authors: Weitong Kong, Di Wen, Kunyu Peng, David Schneider, Zeyun Zhong, Alexander Jaus, Zdravko Marinov, Jiale Wei, Ruiping Liu, Junwei Zheng, Yufan Chen, Lei Qi, Rainer Stiefelhagen,
Abstract要約: 既存のパイプラインは不透明でエンドツーエンドの出力を生成し、検査の中間状態は公開しない。 IMPACT-Cycleは,マルチモーダル反復クレームレベルのメンテナンスとして,長時間ビデオ理解を再構築するマルチエージェントシステムである。
参考スコア（独自算出の注目度）: 73.22944697933603
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Correcting errors in long-video understanding is disproportionately costly: existing multimodal pipelines produce opaque, end-to-end outputs that expose no intermediate state for inspection, forcing annotators to revisit raw video and reconstruct temporal logic from scratch. The core bottleneck is not generation quality alone, but the absence of a supervisory interface through which human effort can be proportional to the scope of each error. We present IMPACT-CYCLE, a supervisory multi-agent system that reformulates long-video understanding as iterative claim-level maintenance of a shared semantic memory -- a structured, versioned state encoding typed claims, a claim dependency graph, and a provenance log. Role-specialized agents operating under explicit authority contracts decompose verification into local object-relation correctness, cross-temporal consistency, and global semantic coherence, with corrections confined to structurally dependent claims. When automated evidence is insufficient, the system escalates to human arbitration as the supervisory authority with final override rights; dependency-closure re-verification then ensures correction cost remains proportional to error scope. Experiments on VidOR show substantially improved downstream reasoning (VQA: 0.71 to 0.79) and a 4.8x reduction in human arbitration cost, with workload significantly lower than manual annotation. Code will be released at https://github.com/MKong17/IMPACT_CYCLE.
Abstract（参考訳）: 既存のマルチモーダルパイプラインは不透明でエンドツーエンドな出力を生成し、検査の中間状態を公開せず、アノテータは生のビデオを再修正し、時間ロジックをスクラッチから再構築する。コアボトルネックは生成品質だけではないが、人間の努力が各エラーの範囲に比例する監督インターフェースが欠如している。本稿では,共有セマンティックメモリの反復的クレームレベルメンテナンスとして,映像理解を再構築するマルチエージェントシステムIMPACT-CYCLEについて述べる。明示的な権限契約の下で機能する役割特化エージェントは、検証を局所的なオブジェクト関係の正当性、時間的整合性、グローバルな意味的一貫性に分解し、構造的依存的なクレームに限定する。自動的な証拠が不十分な場合、システムは最終的なオーバーライド権を持つ監督当局として人間の仲裁にエスカレートする。 VidORの実験では、ダウンストリーム推論(VQA: 0.71 - 0.79)が大幅に改善され、人的仲裁コストが4.8倍削減された。コードはhttps://github.com/MKong17/IMPACT_CYCLEでリリースされる。

論文の概要: IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

関連論文リスト