Fugu-MT 論文翻訳(概要): VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

論文の概要: VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

arxiv url: http://arxiv.org/abs/2605.01517v1
Date: Sat, 02 May 2026 16:10:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.815177
Title: VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
Title（参考訳）: VAnim: 構造保存ベクトルアニメーションのためのレンダリング対応スパース状態モデリング
Authors: Guotao Liang, Zhangcheng Wang, Chuang Wang, Juncheng Hu, Haitao Zhou, Junhua Liu, Jing Zhang, Dong Xu, Qian Yu,
Abstract要約: VAnimはオープンドメインのテキスト・トゥ・SVGアニメーションのための最初のフレームワークである。アニメーションを永続的なSVG DOMツリー上でスパース状態更新(SSU)として再認識する。また,ベクトルアニメーションの最初のベンチマークであるSVGAnim-134kを紹介する。
参考スコア（独自算出の注目度）: 32.226052229379086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scalable Vector Graphics (SVG) animation generation is pivotal for professional design due to their structural editability and resolution independence. However, this task remains challenging as it requires bridging discrete code representations with continuous visual dynamics. Existing optimization-based methods often destroy topological consistency, while general-purpose LLMs rely on rigid CSS/SMIL transformations, failing to model geometry-level non-rigid deformations. To address these limitations, we present VAnim, the first LLM-based framework for open-domain text-to-SVG animation. We reconceptualize animation not as sequence generation, but as Sparse State Updates (SSU) on a persistent SVG DOM tree. This paradigm compresses sequence length by over 9.8x while preserving the SVG DOM structure and non-participating elements by construction. To enable precise control, we propose an Identification-First Motion Planning mechanism that grounds textual instructions in explicit visual entities. Furthermore, to overcome the non-differentiable nature of SVG rendering, we employ Rendering-Aware Reinforcement Learning via Group Relative Policy Optimization (GRPO). By leveraging a hybrid reward from a state-of-the-art video perception encoder, we align discrete code updates with high-fidelity visual feedback. We also introduce SVGAnim-134k, the first benchmark for vector animation. Extensive experiments demonstrate that VAnim significantly outperforms state-of-the-art baselines in semantic alignment and structural validity, with additional appendix metrics further validating motion quality and identity preservation.
Abstract（参考訳）: 拡張ベクトルグラフィックス(SVG)アニメーション生成は、その構造的編集可能性と解像度独立性のために、プロの設計において重要なものである。しかし、このタスクは、連続した視覚的ダイナミクスで個別のコード表現をブリッジする必要があるため、依然として困難である。既存の最適化に基づく手法は、しばしばトポロジ的一貫性を損なうが、汎用LLMは、幾何レベルの非剛性変形をモデル化することができない、剛性CSS/SMIL変換に依存している。これらの制限に対処するため、オープンドメインのテキスト・トゥ・SVGアニメーションのための最初のLCMベースのフレームワークであるVAnimを提示する。我々は,アニメーションをシーケンス生成ではなく,永続的なSVG DOMツリー上のスパース状態更新(SSU)として再認識する。このパラダイムは、SVG DOM構造と非参加要素を構築により保存しながら、シーケンス長を9.8倍以上圧縮する。正確な制御を実現するために,テキストによる指示を明示的な視覚的実体に根拠付ける識別ファーストモーションプランニング機構を提案する。さらに,SVGレンダリングの非差別性を克服するために,グループ相対政策最適化(GRPO)によるレンダリング・アウェア・強化学習(Rendering-Aware Reinforcement Learning)を採用する。最先端のビデオ認識エンコーダからのハイブリッド報酬を活用することで、離散的なコード更新と高忠実度な視覚的フィードバックを一致させる。また,ベクトルアニメーションの最初のベンチマークであるSVGAnim-134kを紹介する。広範囲な実験により、VAnimはセマンティックアライメントと構造的妥当性において最先端のベースラインを著しく上回り、追加の付録指標により動きの質とアイデンティティの保存がさらに検証された。

論文の概要: VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

関連論文リスト