Fugu-MT 論文翻訳(概要): EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

論文の概要: EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

arxiv url: http://arxiv.org/abs/2603.06014v1
Date: Fri, 06 Mar 2026 08:09:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.30519
Title: EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
Title（参考訳）: EffectMaker: カスタマイズされたビジュアルエフェクト生成のための推論と生成の統合
Authors: Shiyuan Yang, Ruihuang Li, Jiale Tao, Shuai Shao, Qinglin Lu, Jing Liao,
Abstract要約: EffectMakerは参照ベースのVFXカスタマイズを可能にする統合推論生成フレームワークである。我々は、3kのVFXカテゴリにわたる130kビデオを含む最大の高品質な合成データセットであるEffectDataを構築した。実験によると、EffectMakerは最先端のベースラインよりも優れた視覚的品質と効果の一貫性を実現している。
参考スコア（独自算出の注目度）: 27.31323449481923
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual effects (VFX) are essential for enhancing the expressiveness and creativity of video content, yet producing high-quality effects typically requires expert knowledge and costly production pipelines. Existing AIGC systems face significant challenges in VFX generation due to the scarcity of effect-specific data and the inherent difficulty of modeling supernatural or stylized effects. Moreover, these approaches often require per-effect fine-tuning, which severely limits their scalability and generalization to novel VFX. In this work, we present EffectMaker, a unified reasoning-generation framework that enables reference-based VFX customization. EffectMaker employs a multimodal large language model to interpret high-level effect semantics and reason about how they should adapt to a target subject, while a diffusion transformer leverages in-context learning to capture fine-grained visual cues from reference videos. These two components form a semantic-visual dual-path guidance mechanism that enables accurate, controllable, and effect-consistent synthesis without per-effect fine-tuning. Furthermore, we construct EffectData, the largest high-quality synthetic dataset containing 130k videos across 3k VFX categories, to improve generalization and scalability. Experiments show that EffectMaker achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation. Project page: https://effectmaker.github.io
Abstract（参考訳）: ビジュアルエフェクト(VFX)は、ビデオコンテンツの表現性と創造性を高めるために不可欠であるが、高品質エフェクトを生成するには、一般的には専門家の知識とコストのかかる生産パイプラインが必要である。既存のAIGCシステムは、効果特化データの不足と、超自然的またはスタイリングされた効果をモデル化することの難しさにより、VFX生成において重大な課題に直面している。さらに、これらのアプローチは、そのスケーラビリティと新しいVFXへの一般化を著しく制限する、効果ごとの微調整を必要とすることが多い。本稿では、参照ベースのVFXカスタマイズを可能にする統合推論生成フレームワークであるEffectMakerを紹介する。エフェクトメーカーは、ハイレベルなエフェクトセマンティクスを解釈し、ターゲットにどのように適応すべきかを推論するために、マルチモーダルな大きな言語モデルを採用している。これらの2つのコンポーネントは、セマンティック・ビジュアル・デュアルパス誘導機構を形成し、精度、制御可能、エフェクト・一貫性の合成を可能にする。さらに,3kのVFXカテゴリにまたがる130kビデオを含む高品質な合成データセットであるエフェクトデータを構築し,一般化と拡張性を向上させる。実験によると、EffectMakerは最先端のベースラインよりも優れた視覚的品質と効果の一貫性を実現し、カスタマイズされたVFX生成のためのスケーラブルで柔軟なパラダイムを提供する。プロジェクトページ: https://effectmaker.github.io

論文の概要: EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

関連論文リスト