Fugu-MT 論文翻訳(概要): JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

論文の概要: JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

arxiv url: http://arxiv.org/abs/2606.01703v1
Date: Mon, 01 Jun 2026 05:12:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.385405
Title: JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions
Title（参考訳）: JenBridge: シーンの遷移を横断する適応的なロングフォームビデオトラック
Authors: Jiashuo Yu, Yao Yao, Boyu Chen, Alex Wang,
Abstract要約: JenBridgeは、高忠実度オーディオ生成とトランジッション自然性の両方を保証する、適応的なロングフォームビデオサウンドトラックのためのフレームワークである。様々なシーンの変化にまたがって長い形式のコヒーレンスを達成するために、JenBridgeは、新しい適応的な遷移メカニズムを取り入れている。このタスクを厳格に評価するために、キュレートされたデータセットと新しい評価指標を含む新しいベンチマークであるLVSベンチマークを提案する。
参考スコア（独自算出の注目度）: 23.0545450404763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the challenge of generating high-fidelity, long-form soundtracks that remain coherent across scene transitions. Existing AI music systems are mainly designed for short, isolated clips and lack mechanisms to ensure narrative continuity. We present JenBridge, a modular and interpretable framework for adaptive long-form video soundtracking that ensures both high-fidelity audio generation and transition naturalness. The core architecture is a Transformer-based generative model trained with a flow-matching objective, following a two-stage paradigm: pretraining on large-scale text-audio corpora to establish robust musical priors, then adapting to the video domain with dual text-visual conditioning for precise cross-modal alignment. Crucially, to achieve long-form coherence across diverse scene changes, JenBridge incorporates a novel adaptive transition mechanism. This system features a versatile toolkit of transition styles, including a generative transition method, and uniquely employs a Large Language Model (LLM) Agent that acts as a director to select the most appropriate transition for each narrative shift intelligently. To rigorously assess this task, we propose the LVS Benchmark, a new benchmark that includes a curated dataset and novel evaluation metrics focusing on holistic and transition-aware assessment. Extensive experiments on the proposed benchmark demonstrate that JenBridge significantly outperforms existing methods in both objective and subjective metrics, particularly in terms of transition naturalness and overall narrative coherence. JenBridge represents a significant step towards fully automated, professional-quality video soundtracking.
Abstract（参考訳）: 我々は,シーン遷移において一貫性のある,高忠実で長大なサウンドトラックを生成するという課題に対処する。既存のAI音楽システムは、主に短い孤立したクリップと物語の連続性を保証するメカニズムの欠如のために設計されている。本稿では,高忠実度音声生成とトランジッション自然性の両方を保証し,適応型長大映像追跡のためのモジュラー・解釈可能なフレームワークであるJenBridgeを提案する。コアアーキテクチャは2段階のパラダイムに従って、フローマッチングの目標をトレーニングしたトランスフォーマーベースの生成モデルである。重要なことに、さまざまなシーンの変化にまたがって長い形式のコヒーレンスを達成するために、JenBridgeは、新しい適応的な遷移メカニズムを取り入れている。このシステムは、生成遷移法を含む、遷移スタイルの多用途ツールキットを特徴とし、ディレクターとして機能するLarge Language Model (LLM) Agentを用いて、各物語シフトにインテリジェントに最も適した遷移を選択する。このタスクを厳格に評価するために,総合的および推移的評価に焦点を当てた,キュレートされたデータセットと新たな評価指標を含む新しいベンチマークであるLVS Benchmarkを提案する。提案したベンチマーク実験により,JenBridgeは客観的および主観的指標の両方において,特に遷移自然性や全体的物語コヒーレンスの観点から,既存の手法を著しく上回っていることが示された。 JenBridgeは、完全自動化されたプロレベルのビデオサウンドトラックへの重要な一歩だ。

論文の概要: JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

関連論文リスト