Fugu-MT 論文翻訳(概要): 3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

論文の概要: 3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

arxiv url: http://arxiv.org/abs/2604.04406v1
Date: Mon, 06 Apr 2026 04:11:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:19.086014
Title: 3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
Title（参考訳）: 3D-Fixer:1枚の画像から3Dシーンのインプレース・コンプリート
Authors: Ze-Xin Yin, Liu Liu, Xinjie Wang, Wei Sui, Zhizhong Su, Jian Yang, Jin Xie,
Abstract要約: 合成3Dシーン生成のための新しいインプレース・コンプリート・パラダイムである3D-Fixerを紹介する。明示的なポーズアライメントを必要とする以前の作品とは異なり、3D-Fixerはレイアウトの忠実さを維持するために空間アンカーとして断片化された幾何学を使用している。これまでで最大のシーンレベルのデータセットであるARSG-110Kについて述べる。
参考スコア（独自算出の注目度）: 26.04490259188974
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Compositional 3D scene generation from a single view requires the simultaneous recovery of scene layout and 3D assets. Existing approaches mainly fall into two categories: feed-forward generation methods and per-instance generation methods. The former directly predict 3D assets with explicit 6DoF poses through efficient network inference, but they generalize poorly to complex scenes. The latter improve generalization through a divide-and-conquer strategy, but suffer from time-consuming pose optimization. To bridge this gap, we introduce 3D-Fixer, a novel in-place completion paradigm. Specifically, 3D-Fixer extends 3D object generative priors to generate complete 3D assets conditioned on the partially visible point cloud at the original locations, which are cropped from the fragmented geometry obtained from the geometry estimation methods. Unlike prior works that require explicit pose alignment, 3D-Fixer uses fragmented geometry as a spatial anchor to preserve layout fidelity. At its core, we propose a coarse-to-fine generation scheme to resolve boundary ambiguity under occlusion, supported by a dual-branch conditioning network and an Occlusion-Robust Feature Alignment (ORFA) strategy for stable training. Furthermore, to address the data scarcity bottleneck, we present ARSG-110K, the largest scene-level dataset to date, comprising over 110K diverse scenes and 3M annotated images with high-fidelity 3D ground truth. Extensive experiments show that 3D-Fixer achieves state-of-the-art geometric accuracy, which significantly outperforms baselines such as MIDI and Gen3DSR, while maintaining the efficiency of the diffusion process. Code and data will be publicly available at https://zx-yin.github.io/3dfixer.
Abstract（参考訳）: 1つのビューから構成的な3Dシーンを生成するには、シーンレイアウトと3Dアセットの同時回復が必要である。既存のアプローチは主に、フィードフォワード生成方法とインスタンス単位生成方法の2つのカテゴリに分類される。前者は、6DoFを明確にした3Dアセットを、効率的なネットワーク推論によって直接予測するが、複雑なシーンにはあまり一般化しない。後者は分割・対数戦略によって一般化を改善するが、時間を要するポーズ最適化に悩まされる。このギャップを埋めるために、3D-Fixerという新しいインプレース・コンプリート・パラダイムを導入する。具体的には、3D-Fixerは、3Dオブジェクト生成先行を拡張して、元の位置で部分的に見える点雲に条件付けられた完全な3Dアセットを生成し、幾何学的推定法から得られた断片化された幾何学から抽出する。明示的なポーズアライメントを必要とする以前の作品とは異なり、3D-Fixerはレイアウトの忠実さを維持するために空間アンカーとして断片化された幾何学を使用している。そこで本研究では,両分岐条件付きネットワークとOcclusion-Robust Feature Alignment(ORFA)戦略を併用して,オクルージョン下での境界曖昧性を解決するための粗大な粒度生成手法を提案する。さらに,データ不足のボトルネックに対処するため,これまでで最大のシーンレベルのデータセットであるARSG-110Kを提示する。広汎な実験により、3D-Fixerは、拡散過程の効率を維持しながら、MIDIやGen3DSRなどのベースラインを大幅に上回る、最先端の幾何学的精度を達成することが示された。コードとデータはhttps://zx-yin.github.io/3dfixerで公開される。

論文の概要: 3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

関連論文リスト