Related papers: Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation

Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation

URL: http://arxiv.org/abs/2305.03112v2
Date: Thu, 29 May 2025 12:31:57 GMT
Title: Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation
Authors: Lechao Cheng, Zerun Liu, Jingxuan He, Chaowei Fang, Dingwen Zhang, Meng Wang,
Abstract summary: Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
Score: 51.14107156747967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weakly supervised semantic segmentation (WSSS) has recently attracted considerable attention because it requires fewer annotations than fully supervised approaches, making it especially promising for large-scale image segmentation tasks. Although many vision transformer-based methods leverage self-attention affinity matrices to refine Class Activation Maps (CAMs), they often treat each layer's affinity equally and thus introduce considerable background noise at deeper layers, where attention tends to converge excessively on certain tokens (i.e., over-smoothing). We observe that this deep-level attention naturally converges on a subset of tokens, yet unregulated query-key affinity can generate unpredictable activation patterns (undisciplined over-smoothing), adversely affecting CAM accuracy. To address these limitations, we propose an Adaptive Re-Activation Mechanism (AReAM), which exploits shallow-level affinity to guide deeper-layer convergence in an entropy-aware manner, thereby suppressing background noise and re-activating crucial semantic regions in the CAMs. Experiments on two commonly used datasets demonstrate that AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions. Overall, this work underscores the importance of controlling deep-level attention to mitigate undisciplined over-smoothing, introduces an entropy-aware mechanism that harmonizes shallow and deep-level affinities, and provides a refined approach to enhance transformer-based WSSS accuracy by re-activating CAMs.

Related papers

ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction [3.7365850182404845]
Any-to-Any Self-Distillation (ATAS) is a novel approach that simultaneously enhances semantic coherence and fine-grained alignment.<n>ATAS achieves substantial performance gains on open-vocabulary object detection and semantic segmentation benchmarks.
arXiv Detail & Related papers (2025-06-10T10:40:10Z)
A TRPCA-Inspired Deep Unfolding Network for Hyperspectral Image Denoising via Thresholded t-SVD and Top-K Sparse Transformer [20.17660504535571]
We propose a novel deep unfolding network (DU-TRPCA) that enforces stage-wise alternation between two tightly integrated modules: low-rank and sparse.<n>Experiments on synthetic and real-world HSIs demonstrate that DU-TRPCA surpasses state-of-the-art methods under severe mixed noise.
arXiv Detail & Related papers (2025-06-03T02:01:39Z)
Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation [51.645152962504056]
In semi-supervised semantic segmentation, data augmentation plays a crucial role in the weak-to-strong consistency regularization framework.<n>We show that spatial augmentation can contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations.<n>We propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy.
arXiv Detail & Related papers (2025-05-29T13:35:48Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models. We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z)
Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering. To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view. Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z)
Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods [22.49176231245093]
We present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging. The Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks.
arXiv Detail & Related papers (2024-04-27T08:08:17Z)
Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation [12.308473939796945]
Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve dense tasks without laborious annotations. The performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity. We propose UniA, a unified single-staged WSSS framework, to tackle this issue from the perspective of uncertainty inference and affinity diversification.
arXiv Detail & Related papers (2024-04-12T01:54:59Z)
Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z)
Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained Skeleton-Based Action Recognition [8.743480762121937]
We propose a Wavelet-Attention Decoupling (WAD) module to disentangle salient and subtle motion features in the time-frequency domain. We also propose a Fine-grained Contrastive Enhancement (FCE) module to enhance attention towards trajectory features by contrastive learning. Our methods perform competitively compared to state-of-the-art methods and can discriminate confusing fine-grained actions well.
arXiv Detail & Related papers (2024-02-03T16:51:04Z)
Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks. In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation [42.66269050864235]
We propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS) We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train)
arXiv Detail & Related papers (2023-08-08T15:14:23Z)
Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization [37.307498788813035]
We conduct a thorough causal analysis to investigate the origins of biased activation. We introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL) We propose an innovative network architecture known as Counterfactual-CAM.
arXiv Detail & Related papers (2023-05-24T17:07:30Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation [24.08326440298189]
We propose a novel activation modulation and recalibration scheme for weakly supervised semantic segmentation. We show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
arXiv Detail & Related papers (2021-12-16T16:26:14Z)
Untangling tradeoffs between recurrence and self-attention in neural networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks. We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies. We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z)
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation. We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.