Fugu-MT 論文翻訳(概要): LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

論文の概要: LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

arxiv url: http://arxiv.org/abs/2604.11792v1
Date: Mon, 13 Apr 2026 17:55:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.738774
Title: LottieGPT: Tokenizing Vector Animation for Autoregressive Generation
Title（参考訳）: LottieGPT:自動回帰生成のためのTokenizing Vector Animation
Authors: Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao,
Abstract要約: ベクトルアニメーションは解像度独立性、コンパクト性、意味構造、パラメトリック編集可能な動き表現を提供する。現在の生成モデルは宇宙でのみ動作するため、合成はできない。我々はQwen-VLを微調整し、コヒーレントで編集可能なベクトルアニメーションを生成することができるネイティブマルチモーダルモデルであるLottieGPTを作成する。
参考スコア（独自算出の注目度）: 63.27046904946992
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current generative models operate exclusively in raster space and thus cannot synthesize them. Meanwhile, recent advances in large multimodal models demonstrate strong capabilities in generating structured data such as slides, 3D meshes, LEGO sequences, and indoor layouts, suggesting that native vector animation generation may be achievable. In this work, we present the first framework for tokenizing and autoregressively generating vector animations. We adopt Lottie, a widely deployed JSON-based animation standard, and design a tailored Lottie Tokenizer that encodes layered geometric primitives, transforms, and keyframe-based motion into a compact and semantically aligned token sequence. To support large-scale training, we also construct LottieAnimation-660K, the largest and most diverse vector animation dataset to date, consisting of 660k real-world Lottie animation and 15M static Lottie image files curated from broad Internet sources. Building upon these components, we finetune Qwen-VL to create LottieGPT, a native multimodal model capable of generating coherent, editable vector animations directly from natural language or visual prompts. Experiments show that our tokenizer dramatically reduces sequence length while preserving structural fidelity, enabling effective autoregressive learning of dynamic vector content. LottieGPT exhibits strong generalization across diverse animation styles and outperforms previous state-of-the-art models on SVG generation (a special case of single-frame vector animation).
Abstract（参考訳）: ビデオ生成の急速な進歩にもかかわらず、既存のモデルは、インターネット上で支配的かつ表現力の高いマルチメディア形式であるベクトルアニメーションを生成できない。ベクトルアニメーションは、解像度独立性、コンパクト性、セマンティック構造、編集可能なパラメトリックモーション表現を提供するが、現在の生成モデルはラスター空間でのみ動作するため、それらを合成することはできない。一方、大規模マルチモーダルモデルの最近の進歩は、スライド、3Dメッシュ、LEGOシーケンス、屋内レイアウトなどの構造化データを生成する強力な能力を示し、ネイティブベクトルアニメーション生成が実現可能であることを示唆している。本研究では,ベクトルアニメーションのトークン化と自動回帰生成のための最初のフレームワークを提案する。私たちは広くデプロイされたJSONベースのアニメーション標準であるLottieを採用し、階層化された幾何学的プリミティブ、変換、キーフレームベースのモーションをコンパクトでセマンティックに整合したトークンシーケンスにエンコードする、カスタマイズされたLottie Tokenizerを設計します。大規模なトレーニングを支援するため,これまでで最大かつ最も多様なベクトルアニメーションデータセットであるLottieAnimation-660Kを構築した。これらのコンポーネントに基づいてQwen-VLを微調整し、自然言語やビジュアルプロンプトから直接コヒーレントで編集可能なベクトルアニメーションを生成することができるネイティブマルチモーダルモデルであるLottieGPTを作成する。実験の結果, トークン化器は構造的忠実性を維持しつつ, 配列長を劇的に短縮し, 動的ベクトルの自己回帰学習を効果的に行うことができることがわかった。 LottieGPTは、様々なアニメーションスタイルにまたがる強力な一般化を示し、SVG生成における従来の最先端モデル(特に単一フレームベクトルアニメーション)より優れている。

論文の概要: LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

関連論文リスト