Fugu-MT 論文翻訳(概要): DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

論文の概要: DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

arxiv url: http://arxiv.org/abs/2605.15116v1
Date: Thu, 14 May 2026 17:29:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.983616
Title: DriveCtrl: Conditioned Sim-to-Real Driving Video Generation
Title（参考訳）: DriveCtrl:コンディション付きシミュレート・トゥ・リアル・ドライビング・ビデオ・ジェネレーション
Authors: Haonan Zhao, Yiting Wang, Jingkun Chen, Valentina Donzella, Thomas Bashford-Rogers, Kurt Debattista,
Abstract要約: DriveCtrlは、リアルな駆動ビデオ合成のための制御可能なsim-to-realビデオ生成フレームワークである。シミュレーション映像をリアルな実世界のデータセットの視覚スタイルに合わせたリアルな運転映像に変換するスケーラブルなデータ生成パイプラインを提案する。
参考スコア（独自算出の注目度）: 16.424889754682727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale labelled driving video data is essential for training autonomous driving systems. Although simulation offers scalable and fully annotated data, the domain gap between synthetic and real-world driving videos significantly limits its utility for downstream deployment. Existing video generation methods are not well-suited for this task, as they fail to simultaneously preserve scene structure, object dynamics, temporal consistency, and visual realism, all of which are critical for maintaining annotation validity in generated data. In this paper, we present DriveCtrl, a depth-conditioned controllable sim-to-real video generation framework for realistic driving video synthesis. Built upon a pretrained video foundation model, DriveCtrl introduces a structure-aware adapter that enables depth-guided generation while preserving the scene layout and motion patterns of the source simulation, producing temporally coherent driving videos that remain aligned with the original simulated sequences. We further introduce a scalable data generation pipeline that transforms simulator videos into realistic driving footage matching the visual style of a target real-world dataset. The pipeline supports three conditioning signals: structural depth, reference-dataset style, and text prompts, while preserving frame-level annotations for downstream perception tasks. To better assess this task, we propose a driving-domain-specific knowledge-informed evaluation metric called Driving Video Realism Score (DVRS) that assesses the realism of generated videos. Experiments demonstrate that DriveCtrl consistently outperforms the base model and competing alternatives in realism, temporal quality, and perception task performance, substantially narrowing the sim-to-real gap for driving video generation.
Abstract（参考訳）: 大規模ラベル付き運転映像データは、自動運転システムの訓練に不可欠である。シミュレーションはスケーラブルで完全に注釈付けされたデータを提供するが、合成ビデオと実世界のドライビングビデオのドメインギャップは、下流への展開においてその有用性を著しく制限している。既存のビデオ生成方法はシーン構造、オブジェクトのダイナミクス、時間的一貫性、視覚的リアリズムを同時に保存できないため、このタスクには適していない。本稿では,リアルな駆動ビデオ合成のための深度条件制御型シミュレート・トゥ・リアルビデオ生成フレームワークであるDriveCtrlを提案する。事前トレーニングされたビデオ基盤モデルに基づいて、DriveCtrlは、ソースシミュレーションのシーンレイアウトと動きパターンを保存しながら、深度誘導型生成を可能にする構造対応アダプタを導入し、元のシミュレートシーケンスに整合した時間的コヒーレントな駆動ビデオを生成する。さらに,シミュレーション映像をリアルな実世界のデータセットの視覚スタイルに合わせたリアルな運転映像に変換する,スケーラブルなデータ生成パイプラインを導入する。パイプラインは、構造深度、参照-データセットスタイル、テキストプロンプトの3つの条件信号をサポートし、下流の知覚タスクのためのフレームレベルのアノテーションを保存する。この課題をよりよく評価するために,ドライビングビデオリアリズムスコア(DVRS)と呼ばれる駆動領域固有の知識インフォームド評価指標を提案し,生成したビデオのリアリズムを評価する。 DriveCtrlは、現実主義、時間的品質、知覚タスクのパフォーマンスにおいて、ベースモデルと競合する代替品を一貫して上回り、ビデオ生成のシミュレートと現実のギャップを大幅に狭めていることを示す実験である。

論文の概要: DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

関連論文リスト