Fugu-MT 論文翻訳(概要): Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

論文の概要: Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

arxiv url: http://arxiv.org/abs/2603.29655v1
Date: Tue, 31 Mar 2026 12:18:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.634344
Title: Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors
Title（参考訳）: すべてのフレームが等しくない:運動スペクトル記述子による複雑度を考慮したマスキング運動生成
Authors: Pengfei Zhou, Xiangyue Zhang, Xukun Shen, Yong Hu,
Abstract要約: 仮面生成モデルは、テキストとモーションの合成において強力なパラダイムとなっているが、それでも動きのフレームを不均一に扱う。本研究では,現在のマスク型モーションジェネレータが動的に複雑な動きに対して不均等に劣化していることを示す。このミスマッチに触発された運動スペクトル記述子(MSD)は,局所的動的複雑性の単純かつパラメータフリーな尺度である。
参考スコア（独自算出の注目度）: 10.685712212496753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked generative models have become a strong paradigm for text-to-motion synthesis, but they still treat motion frames too uniformly during masking, attention, and decoding. This is a poor match for motion, where local dynamic complexity varies sharply over time. We show that current masked motion generators degrade disproportionately on dynamically complex motions, and that frame-wise generation error is strongly correlated with motion dynamics. Motivated by this mismatch, we introduce the Motion Spectral Descriptor (MSD), a simple and parameter-free measure of local dynamic complexity computed from the short-time spectrum of motion velocity. Unlike learned difficulty predictors, MSD is deterministic, interpretable, and derived directly from the motion signal itself. We use MSD to make masked motion generation complexity-aware. In particular, MSD guides content-focused masking during training, provides a spectral similarity prior for self-attention, and can additionally modulate token-level sampling during iterative decoding. Built on top of masked motion generators, our method, DynMask, improves motion generation most clearly on dynamically complex motions while also yielding stronger overall FID on HumanML3D and KIT-ML. These results suggest that respecting local motion complexity is a useful design principle for masked motion generation. Project page: https://xiangyue-zhang.github.io/DynMask
Abstract（参考訳）: マスケ生成モデルは、テキストとモーションの合成において強力なパラダイムとなっているが、マスク、注意、復号の際には動きのフレームを不均一に扱いすぎている。これは、局所的なダイナミックな複雑さが時間とともに急激に変化する動きと一致しない。本研究では,現在のマスク付きモーションジェネレータが動的に複雑な動きに対して不均等に劣化し,フレームワイズ生成誤差が運動力学と強く相関していることを示す。このミスマッチに触発された運動スペクトル記述器(MSD)は、運動速度の短時間スペクトルから計算される局所的動的複雑性の単純かつパラメータフリーな尺度である。学習困難予測器とは異なり、MSDは決定論的であり、解釈可能であり、運動信号自体から直接導出される。我々はMSDを用いて、マスクされた動き生成複雑性を認識させる。特にMSDは、トレーニング中にコンテンツ中心のマスキングをガイドし、自己アテンションに先立ってスペクトル類似性を提供し、反復復号時にトークンレベルのサンプリングを付加的に調整することができる。マスク付きモーションジェネレータ上に構築したDynMaskは、動的に複雑な動きに対して最も明確な動作生成を改善すると同時に、HumanML3DとKIT-MLの全体的なFIDも強化する。これらの結果は、局所的な動きの複雑さを尊重することが、マスクされた動きの生成に有用な設計原理であることを示唆している。プロジェクトページ:https://xiangyue-zhang.github.io/DynMask

論文の概要: Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

関連論文リスト