Fugu-MT 論文翻訳(概要): MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

論文の概要: MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

arxiv url: http://arxiv.org/abs/2603.06846v1
Date: Fri, 06 Mar 2026 20:11:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.196255
Title: MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies
Title（参考訳）: MotionBits: 剛体の動きレベル解析によるビデオセグメンテーション
Authors: Howard H. Qian, Kejia Ren, Yu Xiang, Vicente Ordonez, Kaiyu Hang,
Abstract要約: 我々は、空間的ツイスト等価性を通じて、モーションベースセグメンテーションの最小単位を定義する概念であるMotionBitを紹介する。我々は,(1)MotionBitの概念と定義,(2)MoriBoと呼ばれる手作業によるロボット操作と人体ビデオ間の剛体セグメンテーションを評価するベンチマーク,(3)マクロ平均mIoUで37.3%の精度で,学習不要なグラフベースのMotionBitsセグメンテーション手法を提案する。
参考スコア（独自算出の注目度）: 14.755300270962337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rigid bodies constitute the smallest manipulable elements in the real world, and understanding how they physically interact is fundamental to embodied reasoning and robotic manipulation. Thus, accurate detection, segmentation, and tracking of moving rigid bodies is essential for enabling reasoning modules to interpret and act in diverse environments. However, current segmentation models trained on semantic grouping are limited in their ability to provide meaningful interaction-level cues for completing embodied tasks. To address this gap, we introduce MotionBit, a novel concept that, unlike prior formulations, defines the smallest unit in motion-based segmentation through kinematic spatial twist equivalence, independent of semantics. In this paper, we contribute (1) the MotionBit concept and definition, (2) a hand-labeled benchmark, called MoRiBo, for evaluating moving rigid-body segmentation across robotic manipulation and human-in-the-wild videos, and (3) a learning-free graph-based MotionBits segmentation method that outperforms state-of-the-art embodied perception methods by 37.3\% in macro-averaged mIoU on the MoRiBo benchmark. Finally, we demonstrate the effectiveness of MotionBits segmentation for downstream embodied reasoning and manipulation tasks, highlighting its importance as a fundamental primitive for understanding physical interactions.
Abstract（参考訳）: 剛体は現実世界で最小の操作可能な要素であり、物理的にどのように相互作用するかを理解することは、推論とロボット操作の具体化に不可欠である。したがって、移動する剛体の正確な検出、セグメンテーション、追跡は、モジュールの解釈と様々な環境での動作を可能にするために不可欠である。しかし、セマンティックグルーピングに基づいて訓練された現在のセグメンテーションモデルは、エンボディ化されたタスクを完了するための意味のある相互作用レベルの手がかりを提供する能力に制限されている。このギャップに対処するために、前述した定式化とは異なり、運動に基づく分節の最小単位を、意味論とは無関係に運動的空間的ツイスト同値性によって定義する、新しい概念であるMotionBitを導入する。本稿では,(1)MotionBitの概念と定義,(2)MoriBoと呼ばれる手作業によるロボット操作と人体ビデオ間の剛体セグメンテーションを評価するためのベンチマーク,(3)MotionBitsのセグメンテーション手法を用いて,MoriBoベンチマークのマクロ平均mIoUを37.3倍に向上させる学習自由グラフベースのセグメンテーション手法を提案する。最後に、下流の推論および操作タスクに対するMotionBitsセグメンテーションの有効性を実証し、物理相互作用を理解するための基本的なプリミティブとしての重要性を強調した。

論文の概要: MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

関連論文リスト