Fugu-MT 論文翻訳(概要): TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba

論文の概要: TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba

arxiv url: http://arxiv.org/abs/2603.27314v1
Date: Sat, 28 Mar 2026 15:38:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.89863
Title: TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba
Title（参考訳）: TokenDance:双方向マンバを用いたToken-to-Token Music-to-Dance Generation
Authors: Ziyue Yang, Kaixing Yang, Xulong Tang,
Abstract要約: ミュージック・トゥ・ダンス・ジェネレーションは、仮想現実、ダンス教育、デジタルキャラクターアニメーションに広く応用されている。 TokenDanceは2段階の音楽・ダンス生成フレームワークで、二重モードのトークン化と効率的なトークンレベル生成によって、この制限を明示的に解決する。 TokenDanceは、生成品質と推論速度の両方でSOTA(State-of-the-art)のパフォーマンスを全般的に達成し、実世界の音楽/ダンスアプリケーションにおけるその効果と実用的価値を強調している。
参考スコア（独自算出の注目度）: 5.119197329627647
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism. To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that explicitly addresses this limitation through dual-modality tokenization and efficient token-level generation. In the first stage, we discretize both dance and music using Finite Scalar Quantization, where dance motions are factorized into upper and lower-body components with kinematic-dynamic constraints, and music is decomposed into semantic and acoustic features with dedicated codebooks to capture choreography-specific structures. In the second stage, we introduce a Local-Global-Local token-to-token generator built on a Bidirectional Mamba backbone, enabling coherent motion synthesis, strong music-dance alignment, and efficient non-autoregressive inference. Extensive experiments demonstrate that TokenDance achieves overall state-of-the-art (SOTA) performance in both generation quality and inference speed, highlighting its effectiveness and practical value for real-world music-to-dance applications.
Abstract（参考訳）: ミュージック・トゥ・ダンス・ジェネレーションは、仮想現実、ダンス教育、デジタルキャラクターアニメーションに広く応用されている。しかし、既存の3Dダンスデータセットの限られた範囲は、現在のモデルを限られた音楽スタイルと振付パターンのサブセットに限定しており、現実の音楽への一般化は不十分である。その結果、生成されたダンスは過度に単純で反復的になり、表現力とリアリズムを著しく低下させる。この問題に対処するために,両モードのトークン化と効率的なトークンレベル生成を通じて,この制限に明示的に対処する2段階の音楽間距離生成フレームワークであるTokenDanceを提案する。第1段階では、舞踊と音楽の区別を有限スカラー量子化(Finite Scalar Quantization)を用いて行う。そこでは、舞踊の動きを動力学的制約で上体と下体に分解し、音楽は、専用のコードブックで意味的・音響的特徴に分解して、振付固有の構造を捉える。第2段階では、双方向マンバのバックボーン上に構築されたローカル・グローバル・ローカル・トークン・ツー・トークン・ツー・トークン・ジェネレータを導入し、コヒーレントな動き合成、強い音楽・ダンスアライメント、効率的な非自己回帰推論を可能にした。広汎な実験により、TokenDanceは、生成品質と推論速度の両方において、全体的な最先端(SOTA)のパフォーマンスを達成し、実世界の音楽・ダンスアプリケーションにおけるその有効性と実践的価値を強調した。

論文の概要: TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba

関連論文リスト