Fugu-MT 論文翻訳(概要): TransText: Transparency Aware Image-to-Video Typography Animation

論文の概要: TransText: Transparency Aware Image-to-Video Typography Animation

arxiv url: http://arxiv.org/abs/2603.17944v1
Date: Wed, 18 Mar 2026 17:16:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.84522
Title: TransText: Transparency Aware Image-to-Video Typography Animation
Title（参考訳）: TransText: 画像とビデオのタイポグラフィーアニメーションを透過的に認識する
Authors: Fei Zhang, Zijian Zhou, Bohao Tang, Sen He, Hang Li, Zhe Wang, Soubhik Sanyal, Pengfei Liu, Viktar Atliha, Tao Xiang, Frost Xu, Semih Gunel,
Abstract要約: 層認識型テキスト(グリフ)アニメーションに画像から映像モデルを適用するための第1の手法を提案する。外観と透明性を協調的にモデル化する新しいAlpha-as-RGBパラダイムに基づくフレームワークであるTransTextを提案する。実験の結果,TransTextはベースラインを著しく上回り,コヒーレントで高忠実度なアニメーションを生成することがわかった。
参考スコア（独自算出の注目度）: 35.1650602838868
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce the first method, to the best of our knowledge, for adapting image-to-video models to layer-aware text (glyph) animation, a capability critical for practical dynamic visual design. Existing approaches predominantly handle the transparency-encoding (alpha channel) as an extra latent dimension appended to the RGB space, necessitating the reconstruction of the underlying RGB-centric variational autoencoder (VAE). However, given the scarcity of high-quality transparent glyph data, retraining the VAE is computationally expensive and may erode the robust semantic priors learned from massive RGB corpora, potentially leading to latent pattern mixing. To mitigate these limitations, we propose TransText, a framework based on a novel Alpha-as-RGB paradigm to jointly model appearance and transparency without modifying the pre-trained generative manifold. TransText embeds the alpha channel as an RGB-compatible visual signal through latent spatial concatenation, explicitly ensuring strict cross-modal (RGB-and-Alpha) consistency while preventing feature entanglement. Our experiments demonstrate that TransText significantly outperforms baselines, generating coherent, high-fidelity transparent animations with diverse, fine-grained effects.
Abstract（参考訳）: 本稿では,映像から映像へのモデルをレイヤー対応のテキスト(グリフ)アニメーションに適応させる手法について紹介する。既存のアプローチは主に透過符号化(アルファチャネル)をRGB空間に追加の潜伏次元として扱い、基礎となるRGB中心の変分オートエンコーダ(VAE)の再構築を必要とする。しかし、高品質な透明グリフデータの不足を考えると、VAEの再トレーニングは計算コストが高く、巨大なRGBコーパスから学んだ堅牢なセマンティックな先行性を損なう可能性があり、潜在パターンの混合につながる可能性がある。これらの制約を緩和するために,新たに導入されたAlpha-as-RGBパラダイムに基づくフレームワークであるTransTextを提案し,事前学習された生成多様体を変更することなく外観と透明性を共同でモデル化する。 TransTextは、遅延空間連結により、アルファチャネルをRGB互換の視覚信号として埋め込み、特徴の絡み合いを防止しつつ、厳密なクロスモーダル(RGB-and-Alpha)一貫性を明示的に保証する。実験の結果,TransTextはベースラインを著しく上回り,多彩できめ細かな効果を持つコヒーレントで高忠実な透明なアニメーションを生成することがわかった。

論文の概要: TransText: Transparency Aware Image-to-Video Typography Animation

関連論文リスト