Fugu-MT 論文翻訳(概要): View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

論文の概要: View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

arxiv url: http://arxiv.org/abs/2604.17801v2
Date: Fri, 24 Apr 2026 14:52:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 13:34:21.941814
Title: View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity
Title（参考訳）: デュアルパス構造対応とセマンティック連続性によるビュー一貫性のある3次元シーン編集
Authors: Pufan Li, Bi'an Du, Shenghe Zheng, Junyi Yao, Wei Hu,
Abstract要約: テキスト駆動の3Dシーン編集において、クロスビューの不整合は依然として大きなボトルネックとなっている。マルチビューで一貫した3D編集を配信の観点から再放送する。本稿では,プロジェクション誘導型構造誘導とパッチレベルのセマンティックな伝搬からなる二重経路整合性機構を提案する。
参考スコア（独自算出の注目度）: 11.663455227298122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to optimize the underlying 3D representation. However, cross-view inconsistency remains a major bottleneck. Although recent methods introduce geometric cues, cross-view interactions, or video priors to mitigate this issue, they still largely rely on inference-time synchronization and thus remain limited in robustness and generalization.In this work, we recast multi-view consistent 3D editing from a distributional perspective: 3D scene editing essentially requires a joint distribution modeling across viewpoints.Based on this insight, we propose a view-consistent 3D editing framework that explicitly introduces cross-view dependencies into the editing process. Furthermore, motivated by the observation that structural correspondence and semantic continuity rely on different cross-view cues, we introduce a dual-path consistency mechanism consisting of projection-guided structural guidance and patch-level semantic propagation for effective cross-view editing. Further, we construct a paired multi-view editing dataset that provides reliable supervision for learning cross-view consistency in edited scenes. Extensive experiments demonstrate that our method achieves superior editing performance with precise and consistent views for complex scenes.
Abstract（参考訳）: テキスト駆動の3Dシーン編集が最近注目を集めている。既存のほとんどのメソッドはレンダリング-編集-最適化パイプラインに従っており、マルチビューイメージは3Dシーンからレンダリングされ、2Dイメージエディタで編集され、基礎となる3D表現の最適化に使用される。しかし、クロスビューの不整合は依然として大きなボトルネックである。近年の手法では、この問題を緩和するために、幾何的キュー、クロスビューインタラクション、あるいはビデオの先行処理を導入しているが、いまだに推論時同期に大きく依存しており、堅牢性や一般化に制限されているため、分散的な視点から、マルチビュー一貫した3D編集をリキャストする。さらに、構造的対応と意味的連続性は異なる視点的手がかりに依存しているという観察に動機付けられ、プロジェクション誘導型構造ガイダンスとパッチレベルのセマンティック・プロパゲーションからなる二重パス整合性機構を導入し、効果的なクロスビュー編集を行う。さらに,一対のマルチビュー編集データセットを構築し,編集シーンにおけるクロスビューの一貫性を学習するための信頼性の高い監視を行う。大規模な実験により,複雑なシーンを精度よく一貫したビューで編集性能が向上することが実証された。

論文の概要: View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

関連論文リスト