Related papers: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

URL: http://arxiv.org/abs/2312.03048v2
Date: Mon, 8 Apr 2024 08:59:24 GMT
Title: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Authors: Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov,
Abstract summary: Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
Score: 68.14798033899955
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes". We propose an efficient data generation pipeline termed DGInStyle. First, we examine the problem of specializing a pretrained LDM to semantically-controlled generation within a narrow domain. Second, we propose a Style Swap technique to endow the rich generative prior with the learned semantic control. Third, we design a Multi-resolution Latent Fusion technique to overcome the bias of LDMs towards dominant objects. Using DGInStyle, we generate a diverse dataset of street scenes, train a domain-agnostic semantic segmentation model on it, and evaluate the model on multiple popular autonomous driving datasets. Our approach consistently increases the performance of several domain generalization methods compared to the previous state-of-the-art methods. Source code and dataset are available at https://dginstyle.github.io.

Related papers

DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation [43.842694540544194]
We propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains. We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy.
arXiv Detail & Related papers (2023-12-11T10:26:21Z)
Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation [21.016364582994846]
We present a new diffusion-based domain extension (DIDEX) method. We employ a diffusion model to generate a pseudo-target domain with diverse text prompts. In a second step, we train a generalizing model by adapting towards this pseudo-target domain.
arXiv Detail & Related papers (2023-12-04T12:31:45Z)
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations. Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z)
Semantic-Fused Multi-Granularity Cross-City Traffic Prediction [17.020546413647708]
We propose a Semantic-Fused Multi-Granularity Transfer Learning model to achieve knowledge transfer across cities with fused semantics at different granularities. In detail, we design a semantic fusion module to fuse various semantics while conserving static spatial dependencies. We conduct extensive experiments on six real-world datasets to verify the effectiveness of our STL model.
arXiv Detail & Related papers (2023-02-23T04:26:34Z)
Style Interleaved Learning for Generalizable Person Re-identification [69.03539634477637]
We propose a novel style interleaved learning (IL) framework for DG ReID training. Unlike conventional learning strategies, IL incorporates two forward propagations and one backward propagation for each iteration. We show that our model consistently outperforms state-of-the-art methods on large-scale benchmarks for DG ReID.
arXiv Detail & Related papers (2022-07-07T07:41:32Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)
Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision [73.76277367528657]
Convolutional neural network-based approaches have achieved remarkable progress in semantic segmentation. To cope with this limitation, automatically annotated data generated from graphic engines are used to train segmentation models. We propose a two-step self-supervised domain adaptation approach to minimize the inter-domain and intra-domain gap together.
arXiv Detail & Related papers (2020-04-16T15:24:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.