Related papers: RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis

RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis

URL: http://arxiv.org/abs/2512.17573v1
Date: Fri, 19 Dec 2025 13:39:43 GMT
Title: RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis
Authors: Qilong Wang, Xiaofan Ming, Zhenyi Lin, Jinwen Li, Dongwei Ren, Wangmeng Zuo, Qinghua Hu,
Abstract summary: Virtual furniture synthesis holds substantial promise for home design and e-commerce applications.<n>RoomEditor++ is a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone.<n>RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies.
Score: 89.26382925677301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Virtual furniture synthesis, which seamlessly integrates reference objects into indoor scenes while maintaining geometric coherence and visual realism, holds substantial promise for home design and e-commerce applications. However, this field remains underexplored due to the scarcity of reproducible benchmarks and the limitations of existing image composition methods in achieving high-fidelity furniture synthesis while preserving background integrity. To overcome these challenges, we first present RoomBench++, a comprehensive and publicly available benchmark dataset tailored for this task. It consists of 112,851 training pairs and 1,832 testing pairs drawn from both real-world indoor videos and realistic home design renderings, thereby supporting robust training and evaluation under practical conditions. Then, we propose RoomEditor++, a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone, which is compatible with both U-Net and DiT architectures. This design unifies the feature extraction and inpainting processes for reference and background images. Our in-depth analysis reveals that the parameter-sharing mechanism enforces aligned feature representations, facilitating precise geometric transformations, texture preservation, and seamless integration. Extensive experiments validate that RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies, while highlighting its strong generalization to unseen indoor scenes and general scenes without task-specific fine-tuning. The dataset and source code are available at \url{https://github.com/stonecutter-21/roomeditor}.

Related papers

SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding [18.889530477440793]
SemGS is a feed-forward framework for reconstructing generalizable semantic fields from image inputs.<n>We introduce a camera-aware attention mechanism into the feature extractor to explicitly model geometric relationships between camera viewpoints.<n>Experiments show that SemGS achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2026-03-03T03:06:37Z)
A Modular Framework for Single-View 3D Reconstruction of Indoor Environments [1.979245586749314]
We propose a modular framework for single-view indoor scene 3D reconstruction.<n>Several core modules are powered by diffusion techniques.<n>The framework holds promising potential for applications in interior design, real estate, and augmented reality.
arXiv Detail & Related papers (2025-12-17T22:49:43Z)
DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis [76.7196710324494]
3D indoor layout synthesis is crucial for creating virtual environments.<n>DisCo is a novel framework that disentangles and coordinates physical and semantic refinement.
arXiv Detail & Related papers (2025-10-02T16:30:37Z)
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity [78.7107376451476]
Hi3DEval is a hierarchical evaluation framework tailored for 3D generative content.<n>We extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism.<n>We propose a 3D-aware automated scoring system based on hybrid 3D representations.
arXiv Detail & Related papers (2025-08-07T17:50:13Z)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform [3.864321514889099]
Inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces. We propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent.
arXiv Detail & Related papers (2024-02-28T12:27:28Z)
Unveiling Spaces: Architecturally meaningful semantic descriptions from images of interior spaces [0.0]
This project aims to tackle the problem of extracting architecturally meaningful semantic descriptions from two-dimensional scenes of populated interior spaces. A Generative Adversarial Network (GAN) for image-to-image translation (Pix2Pix) is trained on synthetically generated rendered images of these enclosures, along with corresponding image abstractions representing high-level architectural structure. A similar model evaluation is also carried out on photographs of existing indoor enclosures, to measure its performance in real-world settings.
arXiv Detail & Related papers (2023-12-19T16:03:04Z)
MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis [26.710960922302124]
We propose a real-world Multi-Sensor Hybrid Room dataset (MuSHRoom)<n>Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices.<n>We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis.
arXiv Detail & Related papers (2023-11-05T21:46:12Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.