Fugu-MT 論文翻訳(概要): Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

論文の概要: Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

arxiv url: http://arxiv.org/abs/2604.14302v1
Date: Wed, 15 Apr 2026 18:00:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:29.974445
Title: Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
Title（参考訳）: フリーハンドケッチからの幾何学的に一貫性のあるマルチビューシーン生成
Authors: Ahmed Bourouis, Savas Ozkan, Andrea Maracani, Yi-Zhe Song, Mete Ozay,
Abstract要約: フリーハンドスケッチは、マルチビュージェネレータを提供することができる最も幾何学的に不十分な入力である。学習データの欠如、歪んだ2次元入力からの幾何学的推論の必要性、ビュー間の整合性という3つの複合的な課題に対処する。本フレームワークは,参照画像,反復的精細化,シーンごとの最適化を必要とせず,単一のデノナイジングプロセスですべてのビューを合成する。
参考スコア（独自算出の注目度）: 58.98194920417429
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We tackle a new problem: generating geometrically consistent multi-view scenes from a single freehand sketch. Freehand sketches are the most geometrically impoverished input one could offer a multi-view generator. They convey scene intent through abstract strokes while introducing spatial distortions that actively conflict with any consistent 3D interpretation. No prior method attempts this; existing multi-view approaches require photographs or text, while sketch-to-3D methods need multiple views or costly per-scene optimisation. We address three compounding challenges; absent training data, the need for geometric reasoning from distorted 2D input, and cross-view consistency, through three mutually reinforcing contributions: (i) a curated dataset of $\sim$9k sketch-to-multiview samples, constructed via an automated generation and filtering pipeline; (ii) Parallel Camera-Aware Attention Adapters (CA3) that inject geometric inductive biases into the video transformer; and (iii) a Sparse Correspondence Supervision Loss (CSL) derived from Structure-from-Motion reconstructions. Our framework synthesizes all views in a single denoising process without requiring reference images, iterative refinement, or per-scene optimization. Our approach significantly outperforms state-of-the-art two-stage baselines, improving realism (FID) by over 60% and geometric consistency (Corr-Acc) by 23%, while providing up to a 3.7$\times$ inference speedup.
Abstract（参考訳）: 我々は1枚のフリーハンドスケッチから幾何学的に一貫したマルチビューシーンを生成するという新しい問題に取り組む。フリーハンドスケッチは、マルチビュージェネレータを提供することができる最も幾何学的に不十分な入力である。抽象的なストロークを通してシーン意図を伝達し、一貫した3次元解釈と積極的に矛盾する空間歪みを導入する。既存のマルチビューアプローチでは写真やテキストが必要であり、スケッチから3Dメソッドでは複数のビューが必要であり、シーンごとの最適化に費用がかかる。我々は3つの複合的な課題に対処する: トレーニングデータ不足、歪んだ2次元入力からの幾何学的推論の必要性、および3つの相互強化を通じて、相互に寄与する。 i) 自動生成およびフィルタリングパイプラインを介して構築された$\sim$9kのスケッチ・ツー・マルチビューのデータセットのキュレート。二ビデオトランスに幾何学的帰納バイアスを注入するパラレルカメラ対応注意適応器(CA3) 三構造再生から派生したスパース対応監視損失(CSL) 本フレームワークは,参照画像,反復的精細化,シーンごとの最適化を必要とせず,単一のデノナイズプロセスですべてのビューを合成する。提案手法は最先端の2段階ベースラインを著しく上回り、現実性(FID)を60%以上改善し、幾何整合性(Corr-Acc)を23%向上させ、最大3.7$\times$推論スピードアップを実現した。

論文の概要: Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

関連論文リスト