DifFSS: Diffusion Model for Few-Shot Semantic Segmentation
- URL: http://arxiv.org/abs/2307.00773v3
- Date: Wed, 11 Oct 2023 10:29:59 GMT
- Title: DifFSS: Diffusion Model for Few-Shot Semantic Segmentation
- Authors: Weimin Tan, Siyuan Chen, Bo Yan
- Abstract summary: This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS.
DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure.
- Score: 24.497112957831195
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models have demonstrated excellent performance in image generation.
Although various few-shot semantic segmentation (FSS) models with different
network structures have been proposed, performance improvement has reached a
bottleneck. This paper presents the first work to leverage the diffusion model
for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve
the performance of the state-of-the-art FSS models by a large margin without
modifying their network structure. Specifically, we utilize the powerful
generation ability of diffusion models to generate diverse auxiliary support
images by using the semantic mask, scribble or soft HED boundary of the support
image as control conditions. This generation process simulates the variety
within the class of the query image, such as color, texture variation,
lighting, $etc$. As a result, FSS models can refer to more diverse support
images, yielding more robust representations, thereby achieving a consistent
improvement in segmentation performance. Extensive experiments on three
publicly available datasets based on existing advanced FSS models demonstrate
the effectiveness of the diffusion model for FSS task. Furthermore, we explore
in detail the impact of different input settings of the diffusion model on
segmentation performance. Hopefully, this completely new paradigm will bring
inspiration to the study of FSS task integrated with AI-generated content. Code
is available at https://github.com/TrinitialChan/DifFSS
Related papers
- Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models [96.97910688908956]
We introduce the first zero-shot approach for Video Semantic (VSS) based on pre-trained diffusion models.
We propose a framework tailored for VSS based on pre-trained image and video diffusion models.
Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches.
arXiv Detail & Related papers (2024-05-27T08:39:38Z) - MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration [7.087475633143941]
MM-Diff is a tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds.
MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings.
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings.
arXiv Detail & Related papers (2024-03-22T09:32:31Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot
Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training.
Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC.
We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z) - Versatile Diffusion: Text, Images and Variations All in One Diffusion
Model [76.89932822375208]
Versatile Diffusion handles multiple flows of text-to-image, image-to-text, and variations in one unified model.
Our code and models are open-sourced at https://github.com/SHI-Labs/Versatile-Diffusion.
arXiv Detail & Related papers (2022-11-15T17:44:05Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - A Dataset and Benchmark Towards Multi-Modal Face Anti-Spoofing Under
Surveillance Scenarios [15.296568518106763]
We propose an Attention based Face Anti-spoofing network with Feature Augment (AFA) to solve the FAS towards low-quality face images.
Our model can achieve state-of-the-art performance on the CASIA-SURF dataset and our proposed GREAT-FASD-S dataset.
arXiv Detail & Related papers (2021-03-29T08:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.