Fugu-MT 論文翻訳(概要): Scaling Sequence-to-Sequence Generative Neural Rendering

論文の概要: Scaling Sequence-to-Sequence Generative Neural Rendering

arxiv url: http://arxiv.org/abs/2510.04236v1
Date: Sun, 05 Oct 2025 15:03:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.536132
Title: Scaling Sequence-to-Sequence Generative Neural Rendering
Title（参考訳）: 拡張シーケンスからシーケンスへの生成ニューラルレンダリング
Authors: Shikun Liu, Kam Woh Ng, Wonbong Jang, Jiadong Guo, Junlin Han, Haozhe Liu, Yiannis Douratsos, Juan C. Pérez, Zijian Zhou, Chi Phung, Tao Xiang, Juan-Manuel Pérez-Rúa,
Abstract要約: Kaleidoは、フォトリアリスティック、統一オブジェクト、シーンレベルのニューラルレンダリングのために設計された生成モデルのファミリーである。モデルを可能にする重要なアーキテクチャ革新を紹介します。明示的な3D表現なしで生成的なビュー合成を行う。参照ビューの任意の数で条件付き6-DoFターゲットビューを生成します。シームレスに3Dおよびビデオモデリングを1つのデコーダのみの整流トランスに統一する。
参考スコア（独自算出の注目度）: 37.23230422802279
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Kaleido, a family of generative models designed for photorealistic, unified object- and scene-level neural rendering. Kaleido operates on the principle that 3D can be regarded as a specialised sub-domain of video, expressed purely as a sequence-to-sequence image synthesis task. Through a systemic study of scaling sequence-to-sequence generative neural rendering, we introduce key architectural innovations that enable our model to: i) perform generative view synthesis without explicit 3D representations; ii) generate any number of 6-DoF target views conditioned on any number of reference views via a masked autoregressive framework; and iii) seamlessly unify 3D and video modelling within a single decoder-only rectified flow transformer. Within this unified framework, Kaleido leverages large-scale video data for pre-training, which significantly improves spatial consistency and reduces reliance on scarce, camera-labelled 3D datasets -- all without any architectural modifications. Kaleido sets a new state-of-the-art on a range of view synthesis benchmarks. Its zero-shot performance substantially outperforms other generative methods in few-view settings, and, for the first time, matches the quality of per-scene optimisation methods in many-view settings.
Abstract（参考訳）: 我々は、フォトリアリスティック、統一オブジェクト、シーンレベルのニューラルレンダリングのために設計された生成モデルのファミリーであるKaleidoを紹介する。 Kaleidoは、3Dをビデオの特殊化サブドメインと見なすことができ、純粋にシーケンスからシーケンスへの画像合成タスクとして表現できる、という原則に基づいている。シーケンスからシーケンスへの生成的ニューラルレンダリングのスケーリングに関する体系的研究を通じて、我々は、モデルを実現するための重要なアーキテクチャ革新を紹介します。一明示的な3D表現を伴わずに生成的なビュー合成を行うこと。二マスク付き自己回帰フレームワークを介して、基準ビューの何れかに条件付き6-DoF目標ビューを生成し、三単一の復号器のみの整流変圧器において、3D及びビデオモデリングをシームレスに一体化すること。この統合されたフレームワークの中で、Kaleidoは事前トレーニングに大規模なビデオデータを活用し、空間的一貫性を大幅に改善し、少ないカメラ付き3Dデータセットへの依存を減らす。 Kaleidoは、さまざまなビュー合成ベンチマークに基づいて、最先端の新たな状態を設定する。ゼロショットのパフォーマンスは、数ビュー設定で他の生成メソッドよりも大幅に優れており、初めて、多ビュー設定でシーンごとの最適化メソッドの品質に匹敵する。

論文の概要: Scaling Sequence-to-Sequence Generative Neural Rendering

関連論文リスト