Related papers: Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion

Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion

URL: http://arxiv.org/abs/2601.02881v1
Date: Tue, 06 Jan 2026 10:07:14 GMT
Title: Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion
Authors: Jakob Lønborg Christensen, Morten Rieger Hannemose, Anders Bjorholm Dahl, Vedrana Andersen Dahl,
Abstract summary: This paper introduces a diffusion-based framework for universal image segmentation.<n>We show that a location-aware palette with our 2D gray code ordering improves performance.<n>We believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.
Score: 9.184659875364689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.

Related papers

VFMF: World Modeling by Forecasting Vision Foundation Model Features [67.09340259579761]
We introduce a generative forecaster that performs autoregressive flow matching in vision foundation models feature space.<n>We show that this latent information more effectively than previously used PCA-based alternatives, both for forecasting and other applications.<n>With matched architecture and compute, our method produces sharper and more accurate predictions than regression across all modalities.
arXiv Detail & Related papers (2025-12-12T02:10:05Z)
LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation [2.529281336118734]
We propose LEAF, a medical image segmentation model grounded in latent diffusion models.<n>During the fine-tuning process, we replace the original noise prediction pattern with a direct prediction of the segmentation map.<n>We also employ a feature distillation method to align the hidden states of the convolutional layers with the features from a transformer-based vision encoder.
arXiv Detail & Related papers (2025-07-24T09:08:04Z)
Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We generalize a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality.
arXiv Detail & Related papers (2025-03-06T14:30:55Z)
[MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models.<n>By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Simplified and Generalized Masked Diffusion for Discrete Data [47.711583631408715]
Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data.<n>In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
Denoising Diffusion Semantic Segmentation with Mask Prior Modeling [61.73352242029671]
We propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a denoising diffusion generative model. We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance.
arXiv Detail & Related papers (2023-06-02T17:47:01Z)
SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference. We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z)
A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.