DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition
- URL: http://arxiv.org/abs/2602.11875v1
- Date: Thu, 12 Feb 2026 12:26:09 GMT
- Title: DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition
- Authors: Ji Li, Zhiwei Li, Shihao Li, Zhenjiang Yu, Boyang Wang, Haiou Liu,
- Abstract summary: DiffPlace is a novel framework that introduces a place-ID controller to enable place-controllable multi-view image generation.<n>Our results highlight the potential of generative models in enhancing scene-level and place-aware synthesis.
- Score: 13.947159599420955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models have advanced significantly in realistic image synthesis, with diffusion models excelling in quality and stability. Recent multi-view diffusion models improve 3D-aware street view generation, but they struggle to produce place-aware and background-consistent urban scenes from text, BEV maps, and object bounding boxes. This limits their effectiveness in generating realistic samples for place recognition tasks. To address these challenges, we propose DiffPlace, a novel framework that introduces a place-ID controller to enable place-controllable multi-view image generation. The place-ID controller employs linear projection, perceiver transformer, and contrastive learning to map place-ID embeddings into a fixed CLIP space, allowing the model to synthesize images with consistent background buildings while flexibly modifying foreground objects and weather conditions. Extensive experiments, including quantitative comparisons and augmented training evaluations, demonstrate that DiffPlace outperforms existing methods in both generation quality and training support for visual place recognition. Our results highlight the potential of generative models in enhancing scene-level and place-aware synthesis, providing a valuable approach for improving place recognition in autonomous driving
Related papers
- ViewMorpher3D: A 3D-aware Diffusion Framework for Multi-Camera Novel View Synthesis in Autonomous Driving [20.935790354765604]
We introduce ViewMorpher3D, a multi-view image enhancement framework based on image diffusion models.<n>Unlike single-view approaches, ViewMorpher3D jointly processes a set of rendered views conditioned on camera poses, 3D geometric priors, and temporally adjacent or spatially overlapping reference views.<n>Our framework accommodates variable numbers of cameras and flexible reference/target view configurations, making it adaptable to diverse sensor setups.
arXiv Detail & Related papers (2026-01-12T13:44:14Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments [22.371417505012566]
This work focuses on modeling dynamic urban environments for autonomous driving simulation.<n>We propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement.<n>This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained photorealistic changes of background and objects.
arXiv Detail & Related papers (2025-07-05T03:54:40Z) - Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting [4.89907242398523]
We propose renderability field-guided gaussian splatting (RF-GS) for scene view synthesis.<n>RF-GS quantifies input inhomogeneity through a renderability field, guiding pseudo-view sampling to enhanced visual consistency.<n>Our experiments on simulated and real-world data show that our method outperforms existing approaches in rendering stability.
arXiv Detail & Related papers (2025-04-27T14:41:01Z) - Unpaired Deblurring via Decoupled Diffusion Model [55.21345354747609]
We propose UID-Diff, a generative-diffusion-based model designed to enhance deblurring performance on unknown domains.<n>We employ two Q-Formers as structural features and blur patterns extractors separately. The features extracted will be used for the supervised deblurring task on synthetic data and the unsupervised blur-transfer task.<n>Experiments on real-world datasets demonstrate that UID-Diff outperforms existing state-of-the-art methods in blur removal and structural preservation.
arXiv Detail & Related papers (2025-02-03T17:00:40Z) - StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models [76.62929629864034]
We introduce StreetCrafter, a controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions.<n>In addition, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes.<n>Our model enables flexible control over viewpoint changes, enlarging the view for satisfying rendering regions.
arXiv Detail & Related papers (2024-12-17T18:58:55Z) - DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model [16.716345249091408]
We explore Bird's-Eye View generation, converting a BEV map into its corresponding multi-view street images.
Our approach comprises two main components: the Neural View Transformation and the Street Image Generation.
arXiv Detail & Related papers (2024-09-02T07:47:16Z) - PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models [55.080748327139176]
We introduce PerLDiff, a novel method for effective street view image generation that fully leverages perspective 3D geometric information.<n>PerLDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process.<n> Empirical results justify that our PerLDiff markedly enhances the precision of controllable generation on the NuScenes and KITTI datasets.
arXiv Detail & Related papers (2024-07-08T16:46:47Z) - Learning 3D-Aware GANs from Unposed Images with Template Feature Field [33.32761749864555]
This work targets learning 3D-aware GANs from unposed images.
We propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF)
arXiv Detail & Related papers (2024-04-08T17:42:08Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.