Fugu-MT 論文翻訳(概要): SceneCrafter: Controllable Multi-View Driving Scene Editing

論文の概要: SceneCrafter: Controllable Multi-View Driving Scene Editing

arxiv url: http://arxiv.org/abs/2506.19488v1
Date: Tue, 24 Jun 2025 10:23:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-25 19:48:23.591564
Title: SceneCrafter: Controllable Multi-View Driving Scene Editing
Title（参考訳）: SceneCrafter: コントロール可能なマルチビュー駆動シーン編集
Authors: Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas Guibas, Mingxing Tan, Dragomir Anguelov,
Abstract要約: SceneCrafterは、複数のカメラから撮影した運転シーンをリアルな3D一貫性で操作するための汎用的なエディタである。 SceneCrafterは、既存のベースラインと比較して最先端のリアリズム、制御性、3D一貫性、シーン編集品質を実現している。
参考スコア（独自算出の注目度）: 44.91248700043744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simulation is crucial for developing and evaluating autonomous vehicle (AV) systems. Recent literature builds on a new generation of generative models to synthesize highly realistic images for full-stack simulation. However, purely synthetically generated scenes are not grounded in reality and have difficulty in inspiring confidence in the relevance of its outcomes. Editing models, on the other hand, leverage source scenes from real driving logs, and enable the simulation of different traffic layouts, behaviors, and operating conditions such as weather and time of day. While image editing is an established topic in computer vision, it presents fresh sets of challenges in driving simulation: (1) the need for cross-camera 3D consistency, (2) learning ``empty street" priors from driving data with foreground occlusions, and (3) obtaining paired image tuples of varied editing conditions while preserving consistent layout and geometry. To address these challenges, we propose SceneCrafter, a versatile editor for realistic 3D-consistent manipulation of driving scenes captured from multiple cameras. We build on recent advancements in multi-view diffusion models, using a fully controllable framework that scales seamlessly to multi-modality conditions like weather, time of day, agent boxes and high-definition maps. To generate paired data for supervising the editing model, we propose a novel framework on top of Prompt-to-Prompt to generate geometrically consistent synthetic paired data with global edits. We also introduce an alpha-blending framework to synthesize data with local edits, leveraging a model trained on empty street priors through novel masked training and multi-view repaint paradigm. SceneCrafter demonstrates powerful editing capabilities and achieves state-of-the-art realism, controllability, 3D consistency, and scene editing quality compared to existing baselines.
Abstract（参考訳）: 自動運転車(AV)システムの開発と評価にはシミュレーションが不可欠である。近年の文献は、フルスタックシミュレーションのために、高度にリアルな画像を合成する新しい世代の生成モデルの上に構築されている。しかし、純粋に合成された場面は現実には根付いておらず、結果の関連性に自信を抱くことは困難である。一方、モデル編集では、実際の運転ログからのソースシーンを活用し、異なるトラフィックレイアウト、振る舞い、天候や日時といった運用条件のシミュレーションを可能にする。画像編集はコンピュータビジョンにおいて確立された課題であるが,(1)クロスカメラ3D整合性の必要性,(2)前景オクルージョンによるデータの駆動から「空の街」を学ぶこと,(3)一貫したレイアウトと幾何を維持しつつ,編集条件の異なるペア画像タプルを取得すること,といった新たな課題が提示されている。これらの課題に対処するために,複数のカメラから撮影した実写シーンをリアルな3D一貫性で操作するための汎用的なエディタであるSceneCrafterを提案する。我々は、気象、日時、エージェントボックス、高精細マップなどのマルチモーダルな条件にシームレスにスケールする完全に制御可能なフレームワークを用いて、近年の多視点拡散モデルの進歩の上に構築する。編集モデルを監督するペアデータを生成するために,Prompt-to-Prompt上に新しいフレームワークを提案し,幾何学的に一貫した合成ペアデータとグローバル編集を生成する。我々はまた、新しいマスク付きトレーニングとマルチビュー・リペイント・パラダイムを通じて、空の街路で訓練されたモデルを利用して、局所的な編集でデータを合成するアルファブレンディングフレームワークも導入した。 SceneCrafterは強力な編集機能を示し、既存のベースラインと比較して最先端のリアリズム、制御性、3D一貫性、シーン編集品質を実現している。

論文の概要: SceneCrafter: Controllable Multi-View Driving Scene Editing

関連論文リスト