Fugu-MT 論文翻訳(概要): Feed-Forward Gaussian Splatting from Sparse Aerial Views

論文の概要: Feed-Forward Gaussian Splatting from Sparse Aerial Views

arxiv url: http://arxiv.org/abs/2605.19949v1
Date: Tue, 19 May 2026 15:04:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.460226
Title: Feed-Forward Gaussian Splatting from Sparse Aerial Views
Title（参考訳）: スパース航空から見たフィードフォワードガウススプラッティング
Authors: Dongli Wu, Zhuoxiao Li, Tongyan Hua, Yinrui Ren, Xiaobao Wei, Rongjun Qin, Wufan Zhao,
Abstract要約: 本稿では,空域の少ない都市景観を再現する観測基盤構築フレームワークであるAnyCityを提案する。訓練中、高密度から高密度への蒸留は、高密度から高密度の再生から構造的キューを伝達する一方、航空保存ビデオ拡散は、きめ細かい都市外観のキューを提供する。合成、航空ドメイン、UAVテクスチャ、現実世界のシーンの実験では、フィードフォワードベースラインよりも一貫した改善が見られた。
参考スコア（独自算出の注目度）: 14.51615314064375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reconstructing large-scale urban scenes from sparse aerial views is a crucial yet challenging task. Due to biased top-down and shallow-oblique camera poses, sparse aerial captures exhibit strong evidence imbalance: roofs and open regions are repeatedly observed, while facades, distant buildings, and occluded structures receive little multi-view support. Existing feed-forward 3D Gaussian Splatting methods directly regress a deterministic representation from sparse inputs, but this often leads to ghosting, melted facades, and stretched textures. Recent pseudo-view and video-based generative reconstruction methods use additional supervision or generative priors. However, they often lack a clear separation between observed geometry and prior-driven content, which can lead to plausible but inconsistent structures. We propose AnyCity, an observation-grounded generative reconstruction framework for sparse aerial urban scenes. AnyCity first predicts an observation-supported geometry latent to anchor reliable structures, and then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. During training, dense-to-sparse distillation transfers structural cues from dense-view reconstruction, while an aerial-adapted video diffusion prior provides fine-grained urban appearance cues through gated token conditioning. Observation-preserving objectives keep the refined representation consistent with input-supported geometry. At inference time, AnyCity reconstructs the final 3D Gaussian scene from sparse aerial views in a single feed-forward pass, achieving coherent urban novel-view synthesis with second-level inference. Experiments on synthetic, aerial-domain, UAV-textured, and real-world scenes show consistent improvements over feed-forward baselines.
Abstract（参考訳）: 空の景色から大規模な都市景観を再構築することは、非常に難しい課題です。最上階と浅度斜めのカメラのポーズにより、屋根や空き地が繰り返し観測され、ファサード、遠方建物、閉鎖された建物はほとんどマルチビューの支持を受けていないという、粗い空中撮影は強い不均衡を示す。既存のフィードフォワード3Dガウススメッティング法は、スパース入力から決定論的表現を直接取り除くが、これはしばしばゴースト、溶けたファサード、伸びたテクスチャに繋がる。最近の擬似ビューおよびビデオに基づく生成的再構成手法は、追加の監督または生成的事前を使用する。しかし、しばしば観察された幾何学と事前駆動された内容の間に明確な分離が欠如しており、これは可塑性だが矛盾する構造に繋がる可能性がある。本稿では,空域の少ない都市景観を再現する観測基盤構築フレームワークであるAnyCityを提案する。 AnyCityはまず、信頼性のある構造を固定するために遅延した観測支援幾何学を予測し、続いて足場条件の空中補完トークンを使用して、ガウス復号前の弱い制約のあるコンテンツに対するゲートされた残差更新を予測する。訓練中に高密度から高密度の蒸留が高密度の再生から構造的手がかりを伝達する一方、航空適応型ビデオ拡散は、ゲートトークンコンディショニングを通じて都市部の微細な外観的手がかりを提供する。観測保存目的は、洗練された表現を入力支援幾何と一致させ続ける。推定時に、AnyCityは、最後の3Dガウスのシーンを、1つのフィードフォワードパスでスパースな空中ビューから再構築し、第2レベルの推論によるコヒーレントな都市ノベルビュー合成を実現する。合成、航空ドメイン、UAVテクスチャ、現実世界のシーンの実験では、フィードフォワードベースラインよりも一貫した改善が見られた。

論文の概要: Feed-Forward Gaussian Splatting from Sparse Aerial Views

関連論文リスト