Fugu-MT 論文翻訳(概要): Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation

論文の概要: Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation

arxiv url: http://arxiv.org/abs/2605.05753v1
Date: Thu, 07 May 2026 06:48:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.57369
Title: Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation
Title（参考訳）: 人間の動作セグメンテーションにおける構造的表現と安定化親和性の共同学習
Authors: Xianghan Meng, Zhiyuan Huang, Zhengyu Tong, Chun-Guang Li,
Abstract要約: HMS(Human Motion Clustering)は、動画を異なる人間の動きに対応する非重複セグメントに分割することを目的としている。現実世界のビデオでは、生のフレームレベルの機能はUnionof-Subspacesの仮定に反し、不満足なセグメンテーション性能をもたらすことが多い。本稿では,時間的に一貫した構造的表現と,正確かつ堅牢なHMSとの親和性を共同で学習する時間的自己表現型サブスペースクラスタリング(TDSC)を提案する。
参考スコア（独自算出の注目度）: 7.350724521347576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Motion Segmentation (HMS), which aims to partition a video into non-overlapping segments corresponding to different human motions, has recently attracted increasing research attention. Existing HMS approaches are predominantly based on subspace clustering, which are grounded on the assumption that the distribution of high-dimensional temporal features well aligns with a Union-of-Subspaces (UoS). For videos in the real world, however, the raw frame-level features often violate the UoS assumption and yield unsatisfactory segmentation performance. To address this issue, we propose an efficient and effective approach for HMS, named Temporal Deep Self-expressive subspace Clustering (TDSC), which jointly learns temporally consistent structured representations and stabilized affinity for accurate and robust HMS. Specifically, in TDSC, we alternately learn structured representations of the input frame features and self-expressive coefficients via a properly regularized self-expressive model, in which a coding-rate maximization regularizer is incorporated to avoid representation collapse and conform the learned representations to span a desired UoS distribution, and meanwhile, temporal constraints are incorporated to promote temporally adjacent frames to be partitioned into the same groups. Moreover, we develop a temporal momentum averaging mechanism to stabilize affinity evolution and design a reparameterization strategy to enable efficient optimization. We conduct extensive experiments on five benchmark HMS datasets using both conventional (HoG) and up-to-date deep features (i.e., CLIP, DINOv2) to validate the effectiveness of our approach.
Abstract（参考訳）: HMS(Human Motion Segmentation)は、動画を異なる人間の動きに対応する非重複セグメントに分割することを目的としており、近年研究の注目を集めている。既存のHMSアプローチは主にサブスペースクラスタリングに基づいており、これは高次元の時間的特徴の分布がUnion-of-Subspaces (UoS)とよく一致しているという仮定に基づいている。しかし、現実世界のビデオでは、生のフレームレベルの機能はUoSの仮定に反し、不満足なセグメンテーション性能をもたらすことが多い。この問題に対処するため,TDSC(Temporal Deep Self-presentive Subspace Clustering)と呼ばれるHMSに対して,時間的に一貫した構造的表現を共同で学習し,正確かつ堅牢なHMSに対する親和性を安定化する,効率的かつ効果的なアプローチを提案する。具体的には、TDSCでは、符号化レート最大化正規化器を組み込んで表現の崩壊を回避し、学習した表現を所望のUoS分布に適合させるとともに、時間的制約を組み込んで、時間的隣接フレームを同じグループに分割する、適切な正規化自己表現モデルを用いて、入力フレーム特徴と自己表現係数の構造化表現を交互に学習する。さらに,親和性の進化を安定化するための時間運動量平均化機構を開発し,効率的な最適化を実現するためのパラメータ化戦略を設計する。従来の (HoG) と最新の (CLIP, DINOv2) の両機能を用いて, 5つのベンチマーク・ベンチマーク・データセットの広範な実験を行い, 提案手法の有効性を検証した。

論文の概要: Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation

関連論文リスト