Fugu-MT 論文翻訳(概要): GMOS: Grounding Moving Object Segmentation in 3D Space and Time

論文の概要: GMOS: Grounding Moving Object Segmentation in 3D Space and Time

arxiv url: http://arxiv.org/abs/2605.30352v1
Date: Thu, 28 May 2026 17:59:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.764418
Title: GMOS: Grounding Moving Object Segmentation in 3D Space and Time
Title（参考訳）: GMOS:3D空間と時間における移動物体のセグメンテーション
Authors: Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman,
Abstract要約: 移動オブジェクト(MOS)は、カメラから独立して動くオブジェクトを発見し、セグメンテーションし、追跡することを目的としている。本稿では,RGB動画を直接操作し,時間的に細かな複数の移動物体の分割を3D認識するフレームワークGMOSを提案する。この体制におけるトレーニングと評価を支援するため、オブジェクトごとの時間的動作アノテーションを備えた2,210の現実世界ビデオのデータセットであるGMOS-2Kをキュレートする。
参考スコア（独自算出の注目度）: 95.3020315930043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such as optical flow or point trajectories that lack 3D geometric information, and they treat motion as a sequence-level attribute, overlooking the instantaneous motion state of each object. We address both by grounding MOS in 3D space and time, and propose GMOS, a framework that operates directly on RGB video to produce 3D-aware, temporally fine-grained segmentation of multiple moving objects, alongside a foreground--background variant GMOS-S for faster deployment. To support training and evaluation in this regime, we curate GMOS-2K, a dataset of 2,210 real-world videos with per-object temporal motion annotations drawn from five established Video Object Segmentation (VOS) benchmarks, and formalise MOS-I ("I" for instantaneous), a temporally fine-grained evaluation protocol with three complementary metrics. GMOS achieves state-of-the-art results across MOS, MOS-I, and Unsupervised VOS benchmarks, while running significantly faster than prior multi-object MOS methods and supporting online inference for streaming deployment.
Abstract（参考訳）: 移動オブジェクトセグメンテーション(MOS)は、カメラから独立して動くオブジェクトを発見し、セグメンテーションし、追跡することを目的としている。しかし、現在のMOS法は、光学的フローや3次元幾何学的情報を持たない点軌跡のような事前計算された2次元補助変調に依存しており、各物体の瞬間的な運動状態を見渡して、運動をシーケンスレベルの属性として扱う。我々は,MOSを3次元空間と時間の両方でグラウンド化することで,GMOSを提案する。GMOSはRGBビデオ上で直接動作して,複数の移動物体の時間的にきめ細かなセグメンテーションを生成する。この体制におけるトレーニングと評価を支援するため,既存の5つのビデオオブジェクトセグメンテーション(VOS)ベンチマークから得られたオブジェクトごとの時間的動作アノテーションを備えた2,210個の実世界のビデオのデータセットであるGMOS-2Kをキュレートし,3つの相補的な指標を持つ時間的微細な評価プロトコルであるMOS-I(I)を定式化する。 GMOSは、MOS、MOS-I、Unsupervised VOSベンチマークにまたがる最先端の結果を達成すると同時に、従来のマルチオブジェクトのMOSメソッドよりも大幅に高速に動作し、ストリーミングデプロイメントのオンライン推論をサポートする。

論文の概要: GMOS: Grounding Moving Object Segmentation in 3D Space and Time

関連論文リスト