Fugu-MT 論文翻訳(概要): PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

論文の概要: PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

arxiv url: http://arxiv.org/abs/2604.10125v1
Date: Sat, 11 Apr 2026 09:41:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.856798
Title: PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization
Title（参考訳）: PhyMix: 暗黙の最適化による物理的に一貫性のあるシングルイメージ3次元室内シーン生成を目指して
Authors: Dongli Wu, Jingyu Hu, Ka-Hei Hui, Xiaobao Wei, Chengwen Luo, Jianqiang Li, Zhengzhe Liu,
Abstract要約: 既存のシングルイメージの屋内シーンジェネレータは、視覚的に可視に見えるが現実世界の物理に従わない結果を生み出すことが多い。我々は,4つの主側面の幾何学的先行,接触,安定性,展開性を測定する物理評価器を統一的に導入する。本研究では,物理評価器からのフィードバックをトレーニングと推論の両方に統合し,生成シーンの物理的妥当性を高める枠組みを提案する。
参考スコア（独自算出の注目度）: 22.748975724819264
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing single-image 3D indoor scene generators often produce results that look visually plausible but fail to obey real-world physics, limiting their reliability in robotics, embodied AI, and design. To examine this gap, we introduce a unified Physics Evaluator that measures four main aspects: geometric priors, contact, stability, and deployability, which are further decomposed into nine sub-constraints, establishing the first benchmark to measure physical consistency. Based on this evaluator, our analysis shows that state-of-the-art methods remain largely physics-unaware. To overcome this limitation, we further propose a framework that integrates feedback from the Physics Evaluator into both training and inference, enhancing the physical plausibility of generated scenes. Specifically, we propose PhyMix, which is composed of two complementary components: (i) implicit alignment via Scene-GRPO, a critic-free group-relative policy optimization that leverages the Physics Evaluator as a preference signal and biases sampling towards physically feasible layouts, and (ii) explicit refinement via a plug-and-play Test-Time Optimizer (TTO) that uses differentiable evaluator signals to correct residual violations during generation. Overall, our method unifies evaluation, reward shaping, and inference-time correction, producing 3D indoor scenes that are visually faithful and physically plausible. Extensive synthetic evaluations confirm state-of-the-art performance in both visual fidelity and physical plausibility, and extensive qualitative examples in stylized and real-world images further showcase the robustness of the method. We will release codes and models upon publication.
Abstract（参考訳）: 既存のシングルイメージの屋内シーンジェネレータは、視覚的に可視であるが現実の物理学に従わない結果を生み出し、ロボット工学、AI、デザインの信頼性を制限している。このギャップを調べるために、幾何学的先行性、接触性、安定性、展開可能性の4つの主要な側面を計測し、さらに9つのサブ制約に分解し、物理的整合性を測定するための最初のベンチマークを確立した。この評価値に基づき, 現状の手法は物理を意識しないままであることを示す。この制限を克服するため、我々は物理評価器からのフィードバックをトレーニングと推論の両方に統合し、生成されたシーンの物理的妥当性を高める枠組みをさらに提案する。具体的には,2つの相補成分からなるPhyMixを提案する。 (i)Scene-GRPOによる暗黙的なアライメントは、物理評価器を優先信号として活用し、サンプリングを物理的に実現可能なレイアウトへ向けた、批判のないグループ相対的ポリシー最適化である。 (II) 差分評価器信号を用いて生成中の残留違反を補正するプラグアンドプレイテスト時間最適化器(TTO)による明示的な改善。全体として,評価,報酬形成,推測時間補正を統一し,視覚的に忠実で物理的に妥当な3次元屋内シーンを創出する。広汎な合成評価により、視覚的忠実度と身体的可視性の両方における最先端性能が確認され、また、スタイリングされた実世界の画像における広範囲な定性的な例は、この手法の堅牢性をさらに示している。コードとモデルを公開していきます。

論文の概要: PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

関連論文リスト