Fugu-MT 論文翻訳(概要): JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space

論文の概要: JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space

arxiv url: http://arxiv.org/abs/2606.13345v1
Date: Thu, 11 Jun 2026 13:35:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.820433
Title: JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space
Title（参考訳）: JointEdit3D: 未成年空間におけるフィードフォワード3Dシーン編集
Authors: Xinnan Zhu, Ruijie Xu, Jiayu Ying, Daoguo Dong, Jiachen Xu, Yuan Xie, Xin Tan,
Abstract要約: 既存の3Dシーン編集方法は、明示的な3D表現や編集・再構成パイプラインよりもシーンごとの最適化に頼っている。統合されたRGB-ジオメトリ・ジェネレーション・潜在空間上に構築し,フィードフォワード3Dシーン編集に適応する。 JointEdit3Dは、単一の編集されたRGB参照潜伏剤のみを観察することで非対称潜伏塗布を行う。
参考スコア（独自算出の注目度）: 14.944378716099422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing 3D scene editing methods typically rely on per-scene optimization over explicit 3D representations or cascaded edit-and-reconstruct pipelines, resulting in high test-time cost, limited 3D awareness, and structural inconsistencies. To couple appearance synthesis and geometry prediction during editing, we build on a unified RGB-geometry reconstruction-generation latent space and adapt it to feed-forward 3D scene editing. The resulting framework, \textbf{JointEdit3D}, performs asymmetric latent inpainting by observing only a single edited RGB reference latent and generating the remaining RGB views and edited geometry latent under source-scene anchoring. JointEdit3D introduces a dedicated SceneAnchor Branch to inject source-scene structure without forcing direct copying, and adopts edit/background-aware losses to balance edited-region fidelity with unedited-content preservation. To address the lack of paired resources for standardized 3D scene editing evaluation, we introduce SceneEdit3D-15K, a dataset with 15K paired editing samples and renderer-provided 3D annotations, together with SceneEdit3D-Bench, a curated 100-sample benchmark. Experiments show that JointEdit3D improves edited-region quality and 3D structural completeness over prior baselines while maintaining competitive background preservation.
Abstract（参考訳）: 既存の3Dシーン編集手法は、通常、明示的な3D表現やカスケードされたエディット・アンド・リコンストラクションパイプラインよりもシーンごとの最適化に依存しており、テスト時間コストが高く、3D認識の制限、構造的不整合が生じる。編集中の外観合成と幾何予測を両立させるため,統合されたRGBジオメトリ再構成潜在空間上に構築し,フィードフォワード3Dシーン編集に適応する。結果として生成されるフレームワークである \textbf{JointEdit3D は、単一の編集されたRGB参照ラテントのみを観察し、残りのRGBビューを生成し、ソースシーンアンカーの下で編集された幾何学ラテントを生成することで、非対称のラテント塗装を行う。 JointEdit3Dは、直接コピーを強制せずにソースシーン構造を注入する専用のSceneAnchor Branchを導入し、編集/背景認識損失を採用して編集領域の忠実度と未編集コンテンツ保存のバランスを取る。標準化された3Dシーン編集評価のための2つのリソースの欠如を解決するため、15Kペア編集サンプルとレンダラーによる3Dアノテーションを備えたデータセットであるSceneEdit3D-15Kと、100サンプルベンチマークのSceneEdit3D-Benchを紹介する。実験により、JointEdit3Dは、競合する背景保存を維持しながら、以前のベースラインよりも編集領域の品質と3D構造的完全性を向上することが示された。

論文の概要: JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space

関連論文リスト