Fugu-MT 論文翻訳(概要): SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

論文の概要: SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

arxiv url: http://arxiv.org/abs/2602.22785v1
Date: Thu, 26 Feb 2026 09:19:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-27 18:41:22.617427
Title: SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
Title（参考訳）: SceneTransporter: 単一画像構造3次元シーン生成のための最適輸送誘導合成潜時拡散
Authors: Ling Wang, Hao-Xiang Guo, Xinzhou Wang, Fuchun Sun, Kai Sun, Pengkun Liu, Hang Xiao, Zhong Wang, Guangyuan Fu, Eric Li, Yang Liu, Yikai Wang,
Abstract要約: SceneTransporterは、1つの画像から構造化された3Dシーンを生成するためのエンドツーエンドフレームワークである。テストによると、SceneTransporterは、オープンワールドのシーン生成において、既存のメソッドよりも優れています。
参考スコア（独自算出の注目度）: 30.006450280178466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce SceneTransporter, an end-to-end framework for structured 3D scene generation from a single image. While existing methods generate part-level 3D objects, they often fail to organize these parts into distinct instances in open-world scenes. Through a debiased clustering probe, we reveal a critical insight: this failure stems from the lack of structural constraints within the model's internal assignment mechanism. Based on this finding, we reframe the task of structured 3D scene generation as a global correlation assignment problem. To solve this, SceneTransporter formulates and solves an entropic Optimal Transport (OT) objective within the denoising loop of the compositional DiT model. This formulation imposes two powerful structural constraints. First, the resulting transport plan gates cross-attention to enforce an exclusive, one-to-one routing of image patches to part-level 3D latents, preventing entanglement. Second, the competitive nature of the transport encourages the grouping of similar patches, a process that is further regularized by an edge-based cost, to form coherent objects and prevent fragmentation. Extensive experiments show that SceneTransporter outperforms existing methods on open-world scene generation, significantly improving instance-level coherence and geometric fidelity. Code and models will be publicly available at https://2019epwl.github.io/SceneTransporter/.
Abstract（参考訳）: SceneTransporterは、1つの画像から構造化された3Dシーンを生成するためのエンドツーエンドフレームワークである。既存のメソッドは、部分レベルの3Dオブジェクトを生成するが、オープンワールドのシーンでは、これらのパーツを別々のインスタンスにまとめるのに失敗することが多い。この失敗は、モデルの内部割り当て機構に構造的制約が欠如していることに起因する。そこで本研究では,グローバルな相関代入問題として,構造化3次元シーン生成のタスクを再構成した。これを解決するために、SceneTransporterは、合成DiTモデルのデノイングループ内でエントロピー最適輸送(OT)の目的を定式化し、解決する。この定式化は2つの強力な構造的制約を課す。まず、結果として得られたトランスポートプランは、イメージパッチを部分レベルの3Dラテントに排他的かつ1対1のルーティングを強制するために、アテンションをゲートする。第二に、トランスポートの競争的性質は、エッジベースのコストによってさらに規則化されたプロセスである類似パッチのグループ化を促進し、コヒーレントなオブジェクトを形成し、断片化を防止する。大規模な実験により、SceneTransporterは、オープンワールドシーン生成における既存の手法よりも優れており、インスタンスレベルのコヒーレンスと幾何学的忠実度が著しく向上していることが示された。コードとモデルはhttps://2019epwl.github.io/SceneTransporter/.comで公開される。

論文の概要: SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

関連論文リスト