Mitigating Undisciplined Over-Smoothing in Transformer for Weakly
Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2305.03112v1
- Date: Thu, 4 May 2023 19:11:33 GMT
- Title: Mitigating Undisciplined Over-Smoothing in Transformer for Weakly
Supervised Semantic Segmentation
- Authors: Jingxuan He, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Zhangye Wang,
Wei Chen
- Abstract summary: We propose an adaptive re-activation mechanism (AReAM) that alleviates the issue of incomplete attention within the object and the unbounded background noise.
AReAM accomplishes this by supervising high-level attention with shallow affinity matrices, yielding promising results.
- Score: 41.826919704238556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A surge of interest has emerged in weakly supervised semantic segmentation
due to its remarkable efficiency in recent years. Existing approaches based on
transformers mainly focus on exploring the affinity matrix to boost CAMs with
global relationships. While in this work, we first perform a scrupulous
examination towards the impact of successive affinity matrices and discover
that they possess an inclination toward sparsification as the network
approaches convergence, hence disclosing a manifestation of over-smoothing.
Besides, it has been observed that enhanced attention maps tend to evince a
substantial amount of extraneous background noise in deeper layers. Drawing
upon this, we posit a daring conjecture that the undisciplined over-smoothing
phenomenon introduces a noteworthy quantity of semantically irrelevant
background noise, causing performance degradation. To alleviate this issue, we
propose a novel perspective that highlights the objects of interest by
investigating the regions of the trait, thereby fostering an extensive
comprehension of the successive affinity matrix. Consequently, we suggest an
adaptive re-activation mechanism (AReAM) that alleviates the issue of
incomplete attention within the object and the unbounded background noise.
AReAM accomplishes this by supervising high-level attention with shallow
affinity matrices, yielding promising results. Exhaustive experiments conducted
on the commonly used dataset manifest that segmentation results can be greatly
improved through our proposed AReAM, which imposes restrictions on each
affinity matrix in deep layers to make it attentive to semantic regions.
Related papers
- ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction [3.7365850182404845]
Any-to-Any Self-Distillation (ATAS) is a novel approach that simultaneously enhances semantic coherence and fine-grained alignment.<n>ATAS achieves substantial performance gains on open-vocabulary object detection and semantic segmentation benchmarks.
arXiv Detail & Related papers (2025-06-10T10:40:10Z) - A TRPCA-Inspired Deep Unfolding Network for Hyperspectral Image Denoising via Thresholded t-SVD and Top-K Sparse Transformer [20.17660504535571]
We propose a novel deep unfolding network (DU-TRPCA) that enforces stage-wise alternation between two tightly integrated modules: low-rank and sparse.<n>Experiments on synthetic and real-world HSIs demonstrate that DU-TRPCA surpasses state-of-the-art methods under severe mixed noise.
arXiv Detail & Related papers (2025-06-03T02:01:39Z) - Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation [51.645152962504056]
In semi-supervised semantic segmentation, data augmentation plays a crucial role in the weak-to-strong consistency regularization framework.<n>We show that spatial augmentation can contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations.<n>We propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy.
arXiv Detail & Related papers (2025-05-29T13:35:48Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Hallucination Detection in LLMs via Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.
We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods [22.49176231245093]
We present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection.
By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging.
The Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks.
arXiv Detail & Related papers (2024-04-27T08:08:17Z) - Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation [12.308473939796945]
Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve dense tasks without laborious annotations.
The performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity.
We propose UniA, a unified single-staged WSSS framework, to tackle this issue from the perspective of uncertainty inference and affinity diversification.
arXiv Detail & Related papers (2024-04-12T01:54:59Z) - Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z) - Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained
Skeleton-Based Action Recognition [8.743480762121937]
We propose a Wavelet-Attention Decoupling (WAD) module to disentangle salient and subtle motion features in the time-frequency domain.
We also propose a Fine-grained Contrastive Enhancement (FCE) module to enhance attention towards trajectory features by contrastive learning.
Our methods perform competitively compared to state-of-the-art methods and can discriminate confusing fine-grained actions well.
arXiv Detail & Related papers (2024-02-03T16:51:04Z) - Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks.
In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - All-pairs Consistency Learning for Weakly Supervised Semantic
Segmentation [42.66269050864235]
We propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS)
We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity.
Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train)
arXiv Detail & Related papers (2023-08-08T15:14:23Z) - Counterfactual Co-occurring Learning for Bias Mitigation in
Weakly-supervised Object Localization [37.307498788813035]
We conduct a thorough causal analysis to investigate the origins of biased activation.
We introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL)
We propose an innovative network architecture known as Counterfactual-CAM.
arXiv Detail & Related papers (2023-05-24T17:07:30Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Activation Modulation and Recalibration Scheme for Weakly Supervised
Semantic Segmentation [24.08326440298189]
We propose a novel activation modulation and recalibration scheme for weakly supervised semantic segmentation.
We show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset.
Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
arXiv Detail & Related papers (2021-12-16T16:26:14Z) - Untangling tradeoffs between recurrence and self-attention in neural
networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks.
We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies.
We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.