NERVE: Neighbourhood & Entropy-guided Random-walk for training free open-Vocabulary sEgmentation
- URL: http://arxiv.org/abs/2511.08248v1
- Date: Wed, 12 Nov 2025 01:48:31 GMT
- Title: NERVE: Neighbourhood & Entropy-guided Random-walk for training free open-Vocabulary sEgmentation
- Authors: Kunal Mahatha, Jose Dolz, Christian Desrosiers,
- Abstract summary: We propose a training-free method for Open-Vocabulary Semantics (OVSS) called NERVE.<n>NERVE integrates global and fine-grained local information, exploiting the neighbourhood structure from the self-attention layer of a stable diffusion model.<n>Our method does not require any conventional post-processing techniques like Conditional Random Fields (CRF) or Pixel-Adaptive Mask Refinement (PAMR)
- Score: 18.627047608492795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advances in Open-Vocabulary Semantic Segmentation (OVSS), existing training-free methods face several limitations: use of computationally expensive affinity refinement strategies, ineffective fusion of transformer attention maps due to equal weighting or reliance on fixed-size Gaussian kernels to reinforce local spatial smoothness, enforcing isotropic neighborhoods. We propose a strong baseline for training-free OVSS termed as NERVE (Neighbourhood \& Entropy-guided Random-walk for open-Vocabulary sEgmentation), which uniquely integrates global and fine-grained local information, exploiting the neighbourhood structure from the self-attention layer of a stable diffusion model. We also introduce a stochastic random walk for refining the affinity rather than relying on fixed-size Gaussian kernels for local context. This spatial diffusion process encourages propagation across connected and semantically related areas, enabling it to effectively delineate objects with arbitrary shapes. Whereas most existing approaches treat self-attention maps from different transformer heads or layers equally, our method uses entropy-based uncertainty to select the most relevant maps. Notably, our method does not require any conventional post-processing techniques like Conditional Random Fields (CRF) or Pixel-Adaptive Mask Refinement (PAMR). Experiments are performed on 7 popular semantic segmentation benchmarks, yielding an overall state-of-the-art zero-shot segmentation performance, providing an effective approach to open-vocabulary semantic segmentation.
Related papers
- SPARK: Stochastic Propagation via Affinity-guided Random walK for training-free unsupervised segmentation [18.627047608492795]
Training-free segmentation methods rely on an implicit and limiting assumption, that segmentation is a spectral graph partitioning problem over diffusion-derived affinities.<n>We introduce a Markov propagation scheme that performs random-walk-based diffusion with an adaptive label pruning strategy.<n>Our method achieves state-of-the-art zero-shot performance, producing sharper boundaries, more coherent regions, and significantly more stable masks compared to prior spectral-clustering-based approaches.
arXiv Detail & Related papers (2026-01-31T05:12:17Z) - Boundless Across Domains: A New Paradigm of Adaptive Feature and Cross-Attention for Domain Generalization in Medical Image Segmentation [1.93061220186624]
Domain-invariant representation learning is a powerful method for domain generalization.
Previous approaches face challenges such as high computational demands, training instability, and limited effectiveness with high-dimensional data.
We propose an Adaptive Feature Blending (AFB) method that generates out-of-distribution samples while exploring the in-distribution space.
arXiv Detail & Related papers (2024-11-22T12:06:24Z) - Strongly Isomorphic Neural Optimal Transport Across Incomparable Spaces [7.535219325248997]
We present a novel neural formulation of the Gromov-Monge problem rooted in one of its fundamental properties.
We operationalize this property by decomposing the learnable OT map into two components.
Our framework provides a promising approach to learn OT maps across diverse spaces.
arXiv Detail & Related papers (2024-07-20T18:27:11Z) - Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis [28.18800845199871]
We present a novel non-rigid point set registration method inspired by unsupervised clustering analysis.
Our method achieves high accuracy results across various scenarios and surpasses competitors by a significant margin.
arXiv Detail & Related papers (2024-06-27T01:16:44Z) - Efficient Trajectory Inference in Wasserstein Space Using Consecutive Averaging [3.8623569699070353]
Trajectory inference deals with reconstructing continuous processes from such observations.<n>We propose methods for B-spline approximation and of point clouds through consecutive averaging that is intrinsic to the Wasserstein space.<n>We prove linear convergence rates and rigorously evaluate our method on cell data characterized by bifurcations, merges, and trajectory splitting scenarios.
arXiv Detail & Related papers (2024-05-30T04:19:20Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Randomized Adversarial Style Perturbations for Domain Generalization [49.888364462991234]
We propose a novel domain generalization technique, referred to as Randomized Adversarial Style Perturbation (RASP)
The proposed algorithm perturbs the style of a feature in an adversarial direction towards a randomly selected class, and makes the model learn against being misled by the unexpected styles observed in unseen target domains.
We evaluate the proposed algorithm via extensive experiments on various benchmarks and show that our approach improves domain generalization performance, especially in large-scale benchmarks.
arXiv Detail & Related papers (2023-04-04T17:07:06Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region.
Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training.
Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z) - Patch-level Neighborhood Interpolation: A General and Effective
Graph-based Regularization Strategy [77.34280933613226]
We propose a general regularizer called textbfPatch-level Neighborhood Interpolation(Pani) that conducts a non-local representation in the computation of networks.
Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy.
arXiv Detail & Related papers (2019-11-21T06:31:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.