Related papers: WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation

WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation

URL: http://arxiv.org/abs/2410.12075v2
Date: Mon, 30 Dec 2024 13:34:23 GMT
Title: WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation
Authors: Chenghao Qian, Yuhu Guo, Yuhong Mo, Wenjing Li,
Abstract summary: We propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images.<n>We first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios.<n>We introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions.
Score: 4.141230571282547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e, Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.

Related papers

Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Image Concepts [0.0]
This work investigates the adaptation of large pre-trained latent diffusion models to a radically new imaging domain: Synthetic Aperture Radar (SAR)<n>We explore and compare multiple fine-tuning strategies, including full model fine-tuning and parameter-efficient approaches like Low-Rank Adaptation (LoRA)<n>Our results show that a hybrid tuning strategy yields the best performance, while LoRA-based partial tuning of the text encoder, combined with embedding learning of the SAR> token, suffices to preserve prompt alignment.
arXiv Detail & Related papers (2025-06-16T09:48:01Z)
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z)
MARS: Mesh AutoRegressive Model for 3D Shape Detailization [85.95365919236212]
We introduce MARS, a novel approach for 3D shape detailization. We propose a mesh autoregressive model capable of generating such latent representations through next-LOD token prediction. Experiments conducted on the challenging 3D Shape Detailization benchmark demonstrate that our proposed MARS model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-02-17T03:12:16Z)
[MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models. By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z)
OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes. We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z)
Few-shot Image Generation via Information Transfer from the Built Geodesic Surface [2.617962830559083]
We propose a method called Information Transfer from the Built Geodesic Surface (ITBGS) With the FAGS module, a pseudo-source domain is created by projecting image features from the training dataset into the Pre-Shape Space. We demonstrate that the proposed method consistently achieves optimal or comparable results across a diverse range of semantically distinct datasets.
arXiv Detail & Related papers (2024-01-03T13:57:09Z)
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z)
Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. We introduce a novel generative modeling framework grounded in textbfphase space dynamics Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z)
Don't be so negative! Score-based Generative Modeling with Oracle-assisted Guidance [12.039478020062608]
We develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.
arXiv Detail & Related papers (2023-07-31T07:52:00Z)
T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning. The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z)
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training. Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
Few-Shot Diffusion Models [15.828257653106537]
We present Few-Shot Diffusion Models (FSDM), a framework for few-shot generation leveraging conditional DDPMs. FSDM is trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information. We empirically show that FSDM can perform few-shot generation and transfer to new datasets.
arXiv Detail & Related papers (2022-05-30T23:20:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.