Fugu-MT 論文翻訳(概要): GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

論文の概要: GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

arxiv url: http://arxiv.org/abs/2510.26818v1
Date: Tue, 28 Oct 2025 09:26:59 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:15.84737
Title: GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment
Title（参考訳）: GACA-DiT:Genre-Adaptive Rhythmとコンテキスト認識アライメントを用いた拡散に基づくダンス・ツー・ミュージック生成
Authors: Jinting Wang, Chenxing Li, Li Liu,
Abstract要約: ダンス・トゥ・ミュージック・ジェネレーション(D2M)は、リズムと時間的にダンスの動きに合わせた音楽を自動的に合成することを目的としている。リズム整合性と時間整合性を有する音楽生成のための2つの新しいモジュールを備えた拡散トランスフォーマーベースのフレームワークである textbfGACA-DiT を提案する。 AIST++とTikTokデータセットの実験では、GACA-DiTは客観的メトリクスと人的評価の両方で最先端の手法より優れていることが示されている。
参考スコア（独自算出の注目度）: 16.93446224499017
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Dance-to-music (D2M) generation aims to automatically compose music that is rhythmically and temporally aligned with dance movements. Existing methods typically rely on coarse rhythm embeddings, such as global motion features or binarized joint-based rhythm values, which discard fine-grained motion cues and result in weak rhythmic alignment. Moreover, temporal mismatches introduced by feature downsampling further hinder precise synchronization between dance and music. To address these problems, we propose \textbf{GACA-DiT}, a diffusion transformer-based framework with two novel modules for rhythmically consistent and temporally aligned music generation. First, a \textbf{genre-adaptive rhythm extraction} module combines multi-scale temporal wavelet analysis and spatial phase histograms with adaptive joint weighting to capture fine-grained, genre-specific rhythm patterns. Second, a \textbf{context-aware temporal alignment} module resolves temporal mismatches using learnable context queries to align music latents with relevant dance rhythm features. Extensive experiments on the AIST++ and TikTok datasets demonstrate that GACA-DiT outperforms state-of-the-art methods in both objective metrics and human evaluation. Project page: https://beria-moon.github.io/GACA-DiT/.
Abstract（参考訳）: ダンス・トゥ・ミュージック・ジェネレーション(D2M)は、リズムと時間的にダンスの動きに合わせる音楽を自動的に合成することを目的としている。既存の方法は、大域的な動きの特徴や二項化された関節ベースのリズム値のような粗いリズムの埋め込みに依存しており、細粒な動きの手がかりを排除し、結果として弱いリズムのアライメントをもたらす。さらに、機能ダウンサンプリングによって導入された時間的ミスマッチは、ダンスと音楽の正確な同期をさらに妨げている。これらの問題に対処するために,リズミカルに整合性を持ち,時間的に整合した音楽を生成するための2つの新しいモジュールを備えた拡散トランスフォーマーベースのフレームワークである \textbf{GACA-DiT} を提案する。まず,マルチスケールの時相ウェーブレット解析と空間位相ヒストグラムを適応的な関節重み付けと組み合わせ,微粒でジャンル固有のリズムパターンを捉える。第二に、‘textbf{context-aware temporal alignment} モジュールは、学習可能なコンテキストクエリを使って時間的ミスマッチを解決し、音楽の潜伏者を関連するダンスリズムの特徴と整合させる。 AIST++とTikTokデータセットの大規模な実験は、GACA-DiTが客観的メトリクスと人的評価の両方で最先端の手法より優れていることを示している。プロジェクトページ:https://beria-moon.github.io/GACA-DiT/。

論文の概要: GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

関連論文リスト