DiffusionInst: Diffusion Model for Instance Segmentation
- URL: http://arxiv.org/abs/2212.02773v2
- Date: Wed, 7 Dec 2022 10:28:00 GMT
- Title: DiffusionInst: Diffusion Model for Instance Segmentation
- Authors: Zhangxuan Gu and Haoxing Chen and Zhuoer Xu and Jun Lan and Changhua
Meng and Weiqiang Wang
- Abstract summary: DiffusionInst is a novel framework that represents instances as instance-aware filters.
It is trained to reverse the noisy groundtruth without any inductive bias from RPN.
It achieves competitive performance compared to existing instance segmentation models.
- Score: 15.438504077368936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, diffusion frameworks have achieved comparable performance with
previous state-of-the-art image generation models. Researchers are curious
about its variants in discriminative tasks because of its powerful
noise-to-image denoising pipeline. This paper proposes DiffusionInst, a novel
framework that represents instances as instance-aware filters and formulates
instance segmentation as a noise-to-filter denoising process. The model is
trained to reverse the noisy groundtruth without any inductive bias from RPN.
During inference, it takes a randomly generated filter as input and outputs
mask in one-step or multi-step denoising. Extensive experimental results on
COCO and LVIS show that DiffusionInst achieves competitive performance compared
to existing instance segmentation models. We hope our work could serve as a
simple yet effective baseline, which could inspire designing more efficient
diffusion frameworks for challenging discriminative tasks. Our code is
available in https://github.com/chenhaoxing/DiffusionInst.
Related papers
- Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - Diffusion Models With Learned Adaptive Noise [12.530583016267768]
In this paper, we explore whether the diffusion process can be learned from data.
A widely held assumption is that the ELBO is invariant to the noise process.
We propose MULAN, a learned diffusion process that applies noise at different rates across an image.
arXiv Detail & Related papers (2023-12-20T18:00:16Z) - Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion
Models [76.46246743508651]
We show that current diffusion models actually have an expressive bottleneck in backward denoising.
We introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising.
arXiv Detail & Related papers (2023-09-25T12:03:32Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - Are Diffusion Models Vision-And-Language Reasoners? [30.579483430697803]
We transform diffusion-based models for any image-text matching (ITM) task using a novel method called DiffusionITM.
We introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis.
We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like CLEVR and Winoground.
arXiv Detail & Related papers (2023-05-25T18:02:22Z) - UDPM: Upsampling Diffusion Probabilistic Models [33.51145642279836]
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention.
DDPMs generate high-quality samples from complex data distributions by defining an inverse process.
Unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable.
In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM)
arXiv Detail & Related papers (2023-05-25T17:25:14Z) - Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework.
Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z) - DiffusionRet: Generative Text-Video Retrieval with Diffusion Model [56.03464169048182]
Existing text-video retrieval solutions focus on maximizing the conditional likelihood, i.e., p(candidates|query)
We creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query)
This is accomplished through a diffusion-based text-video retrieval framework (DiffusionRet), which models the retrieval task as a process of gradually generating joint distribution from noise.
arXiv Detail & Related papers (2023-03-17T10:07:19Z) - DiffusionDet: Diffusion Model for Object Detection [56.48884911082612]
DiffusionDet is a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes.
Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation.
arXiv Detail & Related papers (2022-11-17T18:56:19Z) - Subspace Diffusion Generative Models [4.310834990284412]
Score-based models generate samples by mapping noise to data (and vice versa) via a high-dimensional diffusion process.
We restrict the diffusion via projections onto subspaces as the data distribution evolves toward noise.
Our framework is fully compatible with continuous-time diffusion and retains its flexible capabilities.
arXiv Detail & Related papers (2022-05-03T13:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.