Fugu-MT 論文翻訳(概要): Open-World Video Segmentation

論文の概要: Open-World Video Segmentation

arxiv url: http://arxiv.org/abs/2606.15632v2
Date: Wed, 17 Jun 2026 08:07:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 13:57:35.188168
Title: Open-World Video Segmentation
Title（参考訳）: オープンワールドビデオセグメンテーション
Authors: Qing Su, Kaiyang Li, Yuan Zhuang, Fei Miao, Shihao Ji,
Abstract要約: 本稿では,オープンワールドビデオセグメンテーションのための実用的で強力なシステムであるSavvyを紹介する。 Savvyは、永続的なオブジェクト発見、安全なトラックプロモーション、安定した長距離アイデンティティメンテナンスをサポートする。また,オープンワールドビデオセグメンテーションのための粒度認識評価スイートであるOGAを提案する。
参考スコア（独自算出の注目度）: 21.65294890698273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing evaluation protocols rely on a rigid 1:1 matching that unfairly penalizes semantically valid predictions with mismatched granularity. To address both gaps, we introduce Savvy, a practical and strong system for zero-shot open-world long-horizon video segmentation. Savvy combines hierarchical mask discovery, deferred admission, and track consolidation to support persistent object discovery, safe track promotion, and stable long-range identity maintenance. We further propose OGA, a granularity-aware evaluation suite for open-world video segmentation. Built on a Granularity-Agnostic (GA) matching protocol, OGA relaxes conventional 1:1 matching to an n:1 mapping, but still enforces temporal rigor by detecting support discontinuities through sever points and scoring each reference object through its dominant coherent fragment. This prevents fragmented or flickering support from being over-rewarded while enabling GA-adapted metrics and structural diagnostics: identity persistence (IP), and identity concentration (IC). On VIPSeg, we show that standard 1:1 evaluation substantially underestimates open-world methods, whereas GA evaluation recovers much of their suppressed performance. On the more realistic long-horizon benchmarks: ScanNet and HM3D, Savvy consistently outperforms strong baselines across both classical and proposed metrics, including STQ, VPQ$_\infty$, IP and IC. Together, these results establish a practical benchmark and a strong baseline for open-world long-horizon video segmentation.
Abstract（参考訳）: ビデオセグメンテーションはショートクリップやクローズドセットのベンチマークで急速に進歩しているが、オープンワールドビデオセグメンテーションはいまだに未調査である。課題は2つある: (1) 既存の手法は、動的なエゴモーションの長いビデオにおいて、オブジェクトの発見とアイデンティティの維持をサポートするように設計されていない; (2) 既存の評価プロトコルは、不当に不一致の粒度で意味論的に有効な予測を罰する厳密な1:1マッチングに依存している。両ギャップに対処するために,ゼロショット・オープンワールド・ロングホライゾン・ビデオセグメンテーションのための実用的で強力なシステムであるSavvyを紹介する。 Savvyは、階層的なマスク発見、遅延入場、トラック統合を組み合わせ、永続的なオブジェクト発見、安全なトラックプロモーション、安定した長距離アイデンティティ維持をサポートする。さらに,オープンワールドビデオセグメンテーションのための粒度認識評価スイートであるOGAを提案する。グラニュラリティ・アグノスティック(GA)マッチングプロトコル上に構築されたOGAは、従来の1:1マッチングをn:1マッピングに緩和するが、サポートの不連続性を厳密な点から検出し、各参照オブジェクトを支配的なコヒーレントな断片を通じてスコアリングすることで、時間的厳密さを継続する。これにより、断片化やフリッカリングのサポートがオーバーリワードされるのを防ぎ、GA対応のメトリクスと構造診断、すなわちアイデンティティ永続化(IP)とアイデンティティ集中(IC)を可能にする。 VIPSegでは,標準1:1評価がオープンワールド手法をかなり過小評価しているのに対し,GA評価は抑制された性能の多くを回復している。 ScanNetとHM3Dのより現実的なロングホライゾンベンチマークでは、Savvyは、STQ、VPQ$_\infty$、IP、ICなど、古典的および提案された指標の両方において、一貫して強力なベースラインを上回っている。これらの結果は,オープンワールドの長距離ビデオセグメンテーションにおいて,実用的なベンチマークと強力なベースラインを確立した。

論文の概要: Open-World Video Segmentation

関連論文リスト