Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
- URL: http://arxiv.org/abs/2508.04122v1
- Date: Wed, 06 Aug 2025 06:38:46 GMT
- Title: Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
- Authors: Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel, Maximilian Durner,
- Abstract summary: OC-DiT is a class of diffusion models designed for object-centric prediction.<n>We propose a conditional latent diffusion framework that generates instance masks.<n>We train these models on a newly created, large-scale synthetic dataset.
- Score: 16.225638630932675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents OC-DiT, a novel class of diffusion models designed for object-centric prediction, and applies it to zero-shot instance segmentation. We propose a conditional latent diffusion framework that generates instance masks by conditioning the generative process on object templates and image features within the diffusion model's latent space. This allows our model to effectively disentangle object instances through the diffusion process, which is guided by visual object descriptors and localized image cues. Specifically, we introduce two model variants: a coarse model for generating initial object instance proposals, and a refinement model that refines all proposals in parallel. We train these models on a newly created, large-scale synthetic dataset comprising thousands of high-quality object meshes. Remarkably, our model achieves state-of-the-art performance on multiple challenging real-world benchmarks, without requiring any retraining on target data. Through comprehensive ablation studies, we demonstrate the potential of diffusion models for instance segmentation tasks.
Related papers
- Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model [0.3749861135832073]
In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples.<n>We propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects.<n>By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments.
arXiv Detail & Related papers (2025-03-05T02:08:51Z) - Accelerated Diffusion Models via Speculative Sampling [89.43940130493233]
Speculative sampling is a popular technique for accelerating inference in Large Language Models.<n>We extend speculative sampling to diffusion models, which generate samples via continuous, vector-valued Markov chains.<n>We propose various drafting strategies, including a simple and effective approach that does not require training a draft model.
arXiv Detail & Related papers (2025-01-09T16:50:16Z) - [MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models.<n>By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC [102.64648158034568]
diffusion models have quickly become the prevailing approach to generative modeling in many domains.
We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators.
We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z) - Open-vocabulary Object Segmentation with Diffusion Models [47.36233857830832]
The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map.
We adopt the augmented diffusion model to build a synthetic semantic segmentation dataset, and show that, training a standard segmentation model on such dataset demonstrates competitive performance on the zero-shot segmentation(ZS3) benchmark.
arXiv Detail & Related papers (2023-01-12T18:59:08Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.