Fugu-MT 論文翻訳(概要): From Zero to Hero: Training-Free Custom Concept Spawning in World Models

論文の概要: From Zero to Hero: Training-Free Custom Concept Spawning in World Models

arxiv url: http://arxiv.org/abs/2606.02575v1
Date: Mon, 01 Jun 2026 17:59:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:32.567282
Title: From Zero to Hero: Training-Free Custom Concept Spawning in World Models
Title（参考訳）: ゼロからヒーローへ:世界モデルのトレーニング不要のカスタムコンセプト
Authors: Kiymet Akdemir, Pinar Yanardag,
Abstract要約: SPAWN(Swapping Anchor with Windowed iNjection)は,概念生成のためのトレーニング不要な手法である。我々はSPAWNが一貫した照明、スケール、視点と概念を統合しつつ、アイデンティティと時間的コヒーレンスを保っていることを示す。
参考スコア（独自算出の注目度）: 11.223537813710996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive world models have emerged as a powerful paradigm for interactive video generation, allowing users to navigate dynamically generated environments through actions. These models are typically conditioned on a text prompt and/or a single reference frame, from which the entire world is generated. Yet the moment the user navigates beyond what is visible in that frame, the unseen regions are populated by the base model's priors, with no mechanism for the user to specify what should appear and where. This is a fundamental limitation for applications such as gaming, interactive storytelling, and simulation, where controllable scene composition is essential. We refer to this missing capability as concept spawning; introducing a user-specified visual concept into a world model, analogous to spawning in a game engine. We introduce SPAWN (Swapping Pinned Anchor with Windowed iNjection), a training-free method for concept spawning. SPAWN exploits a structural property of image-to-video backbones: the first slot of the context memory is pinned to the reference frame and acts as a foundational anchor for every generated chunk. By swapping this anchor with an external concept latent over a short injection window and letting the original anchor return, we cause the concept to propagate naturally through the rollout via the model's own memory. SPAWN supports concepts from fine-grained entities such as characters and props to large-scale elements such as buildings and landmarks, and accepts either a concept image or a text description as input. Experiments show that SPAWN integrates concepts with consistent lighting, scale, and perspective while preserving identity and temporal coherence, demonstrating that controllable concept spawning is achievable in existing autoregressive world models without any training.
Abstract（参考訳）: 自動回帰世界モデルはインタラクティブなビデオ生成のための強力なパラダイムとして登場し、ユーザはアクションを通じて動的に生成された環境をナビゲートすることができる。これらのモデルは典型的には、テキストプロンプトおよび/または単一の参照フレームに条件付けされ、そこから世界全体が生成される。しかし、ユーザーがそのフレームで何が見えるかを超えてナビゲートする瞬間、目に見えない領域はベースモデルの先行値によって占有される。これは、ゲーム、インタラクティブなストーリーテリング、シミュレーションのような、コントロール可能なシーン構成が不可欠であるアプリケーションの基本的な制限である。我々は、この欠落した能力を概念創出と呼び、ゲームエンジンの発芽に類似した、ユーザ指定の視覚概念を世界モデルに導入する。 SPAWN(Swapping Pinned Anchor with Windowed iNjection)は,概念生成のためのトレーニング不要な手法である。 SPAWNは画像からビデオまでのバックボーンの構造的特性を利用しており、コンテキストメモリの最初のスロットは参照フレームに固定され、生成されたチャンクごとに基本アンカーとして機能する。このアンカーを、短いインジェクションウィンドウ上に潜入した外部概念に置き換え、元のアンカーを返却することで、モデル自身のメモリを介してロールアウトを通じて自然にこの概念を伝播させる。 SPAWNは、文字や小道具のようなきめ細かい実体から建物やランドマークのような大規模な要素までの概念をサポートし、概念イメージまたはテキスト記述を入力として受け入れる。実験により、SPAWNは、一貫した照明、スケール、視点を、アイデンティティと時間的コヒーレンスを維持しながら統合し、制御可能な概念の創出が、トレーニングなしで既存の自己回帰的世界モデルで達成可能であることを示す。

論文の概要: From Zero to Hero: Training-Free Custom Concept Spawning in World Models

関連論文リスト