Dual Stage Stylization Modulation for Domain Generalized Semantic
Segmentation
- URL: http://arxiv.org/abs/2304.09347v4
- Date: Thu, 3 Aug 2023 09:21:47 GMT
- Title: Dual Stage Stylization Modulation for Domain Generalized Semantic
Segmentation
- Authors: Gabriel Tjio, Ping Liu, Chee-Keong Kwoh, Joey Tianyi Zhou
- Abstract summary: We introduce a dual-stage Feature Transform (dFT) layer within the Adversarial Semantic Hallucination+ framework.
By leveraging semantic information for each pixel, our approach adaptively adjusts the pixel-wise hallucination strength.
We validate the effectiveness of our proposed method through comprehensive experiments on publicly available semantic segmentation benchmark datasets.
- Score: 39.35385886870209
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Obtaining sufficient labeled data for training deep models is often
challenging in real-life applications. To address this issue, we propose a
novel solution for single-source domain generalized semantic segmentation.
Recent approaches have explored data diversity enhancement using hallucination
techniques. However, excessive hallucination can degrade performance,
particularly for imbalanced datasets. As shown in our experiments, minority
classes are more susceptible to performance reduction due to hallucination
compared to majority classes. To tackle this challenge, we introduce a
dual-stage Feature Transform (dFT) layer within the Adversarial Semantic
Hallucination+ (ASH+) framework. The ASH+ framework performs a dual-stage
manipulation of hallucination strength. By leveraging semantic information for
each pixel, our approach adaptively adjusts the pixel-wise hallucination
strength, thus providing fine-grained control over hallucination. We validate
the effectiveness of our proposed method through comprehensive experiments on
publicly available semantic segmentation benchmark datasets (Cityscapes and
SYNTHIA). Quantitative and qualitative comparisons demonstrate that our
approach is competitive with state-of-the-art methods for the Cityscapes
dataset and surpasses existing solutions for the SYNTHIA dataset. Code for our
framework will be made readily available to the research community.
Related papers
- Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [123.54980913741828]
Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data.
They invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images.
Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information.
However, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations.
arXiv Detail & Related papers (2024-05-24T08:46:31Z) - IBD: Alleviating Hallucinations in Large Vision-Language Models via
Image-Biased Decoding [37.16880672402059]
Over-reliance on linguistic priors has been identified as a key factor leading to hallucinations.
We propose to alleviate this problem by introducing a novel image-biased decoding technique.
Our method derives the next-token probability distribution by contrasting predictions from a conventional LVLM with those of an image-biased LVLM.
arXiv Detail & Related papers (2024-02-28T16:57:22Z) - Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition [49.26065739704278]
We propose a framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition.
An instance-view data hallucination module hallucinates each sample of a novel class to generate new data.
A prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class.
arXiv Detail & Related papers (2024-01-13T12:32:29Z) - S$^2$ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for
Scribble-supervised Polyp Segmentation [21.208071679259604]
We develop a framework of spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning.
We produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning.
Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels.
arXiv Detail & Related papers (2023-06-01T08:47:58Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z) - Synthetic Convolutional Features for Improved Semantic Segmentation [139.5772851285601]
We suggest to generate intermediate convolutional features and propose the first synthesis approach that is catered to such intermediate convolutional features.
This allows us to generate new features from label masks and include them successfully into the training procedure.
Experimental results and analysis on two challenging datasets Cityscapes and ADE20K show that our generated feature improves performance on segmentation tasks.
arXiv Detail & Related papers (2020-09-18T14:12:50Z) - Semi-Supervised StyleGAN for Disentanglement Learning [79.01988132442064]
Current disentanglement methods face several inherent limitations.
We design new architectures and loss functions based on StyleGAN for semi-supervised high-resolution disentanglement learning.
arXiv Detail & Related papers (2020-03-06T22:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.