Fugu-MT 論文翻訳(概要): SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

論文の概要: SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

arxiv url: http://arxiv.org/abs/2603.21073v1
Date: Sun, 22 Mar 2026 06:00:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.226093
Title: SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing
Title（参考訳）: SqueezeComposer: テンポラリなスピードアップは、長めの音楽作曲のためのシンプルなトリック
Authors: Jianyi Chen, Rongxiu Zhong, Shilei Zhang, Kun Qian, Jinglei Liu, Yike Guo, Wei Xue,
Abstract要約: AIモデルは、2倍、4倍、または8倍のレートで、タイムアクセラレーションされた(スピードアップされた)オーディオを理解し、生成できると仮定する。音楽の高速バージョンを最初に生成することにより、時間長とリソース要件を大幅に削減する。我々は、このアイデアをSqueezeComposerでインスタンス化します。これは、拡散モデルを利用して、加速されたドメインの生成と復元されたドメインの洗練を行います。
参考スコア（独自算出の注目度）: 35.732692220471606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Composing coherent long-form music remains a significant challenge due to the complexity of modeling long-range dependencies and the prohibitive memory and computational requirements associated with lengthy audio representations. In this work, we propose a simple yet powerful trick: we assume that AI models can understand and generate time-accelerated (speeded-up) audio at rates such as 2x, 4x, or even 8x. By first generating a high-speed version of the music, we greatly reduce the temporal length and resource requirements, making it feasible to handle long-form music that would otherwise exceed memory or computational limits. The generated audio is then restored to its original speed, recovering the full temporal structure. This temporal speed-up and slow-down strategy naturally follows the principle of hierarchical generation from abstract to detailed content, and can be conveniently applied to existing music generation models to enable long-form music generation. We instantiate this idea in SqueezeComposer, a framework that employs diffusion models for generation in the accelerated domain and refinement in the restored domain. We validate the effectiveness of this approach on two tasks: long-form music generation, which evaluates temporal-wise control (including continuation, completion, and generation from scratch), and whole-song singing accompaniment generation, which evaluates track-wise control. Experimental results demonstrate that our simple temporal speed-up trick enables efficient, scalable, and high-quality long-form music generation. Audio samples are available at https://SqueezeComposer.github.io/.
Abstract（参考訳）: コヒーレントなロングフォーム音楽を構成することは、長距離依存をモデル化する複雑さと、長大な音声表現に関連する禁忌なメモリと計算要求が原因で、依然として大きな課題である。本稿では,AIモデルが2倍,4倍,あるいは8倍の速度で,タイムアクセラレーション(スピードアップ)音声を理解・生成できると仮定する。音楽の高速バージョンを最初に生成することにより、時間的長さとリソースの要求を大幅に削減し、メモリや計算限界を超えるような長大な音楽を扱うことが可能となる。生成されたオーディオは元の速度に復元され、完全な時間構造が復元される。この時間的スピードアップとスローダウン戦略は、抽象コンテンツから詳細コンテンツへの階層的生成の原則を自然に踏襲し、既存の音楽生成モデルに便利に適用し、長大な音楽生成を可能にする。我々は、このアイデアをSqueezeComposerでインスタンス化します。これは、拡散モデルを利用して、加速されたドメインの生成と復元されたドメインの洗練を行います。提案手法の有効性を,時間的制御(継続,完了,スクラッチからの生成を含む)を評価する長調音楽生成と,トラックワイズ制御を評価する全歌唱伴奏生成の2つのタスクで検証する。実験により, 簡単な時間的スピードアップ手法により, 効率よく, スケーラブルで, 高品質な長大な音楽生成が可能となった。オーディオサンプルはhttps://SqueezeComposer.github.io/.com/で入手できる。

論文の概要: SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

関連論文リスト