Related papers: Lightweight Transformer Framework for Weakly Supervised Semantic Segmentation

Lightweight Transformer Framework for Weakly Supervised Semantic Segmentation

URL: http://arxiv.org/abs/2511.19765v1
Date: Mon, 24 Nov 2025 22:40:57 GMT
Title: Lightweight Transformer Framework for Weakly Supervised Semantic Segmentation
Authors: Ali Torabi, Sanjog Gaihre, Yaqoob Majeed,
Abstract summary: Weakly supervised semantic segmentation (WSSS) must learn dense masks from noisy, under-specified cues.<n>We show that three small, synergistic changes make weak supervision markedly more effective without altering the MiT backbone.<n>Our method, CrispFormer, augments the decoder with: (1) a boundary branch that supervises thin object contours using a lightweight edge head and a boundary-aware loss; (2) an uncertainty-guided refiner that predicts per-pixel aleatoric uncertainty and uses it to weight losses and gate a residual correction of the segmentation logits; and (3) a dynamic multi-scale fusion layer that replaces static concate
Score: 0.45880283710344055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised semantic segmentation (WSSS) must learn dense masks from noisy, under-specified cues. We revisit the SegFormer decoder and show that three small, synergistic changes make weak supervision markedly more effective-without altering the MiT backbone or relying on heavy post-processing. Our method, CrispFormer, augments the decoder with: (1) a boundary branch that supervises thin object contours using a lightweight edge head and a boundary-aware loss; (2) an uncertainty-guided refiner that predicts per-pixel aleatoric uncertainty and uses it to weight losses and gate a residual correction of the segmentation logits; and (3) a dynamic multi-scale fusion layer that replaces static concatenation with spatial softmax gating over multi-resolution features, optionally modulated by uncertainty. The result is a single-pass model that preserves crisp boundaries, selects appropriate scales per location, and resists label noise from weak cues. Integrated into a standard WSSS pipeline (seed, student, and EMA relabeling), CrispFormer consistently improves boundary F-score, small-object recall, and mIoU over SegFormer baselines trained on the same seeds, while adding minimal compute. Our decoder-centric formulation is simple to implement, broadly compatible with existing SegFormer variants, and offers a reproducible path to higher-fidelity masks from image-level supervision.

Related papers

Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation [83.90109373769614]
3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation.<n>We propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping.<n>We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability.
arXiv Detail & Related papers (2025-11-30T15:51:30Z)
Extremal Contours: Gradient-driven contours for compact visual attribution [5.6220652636435915]
We show how a star-supervised framework can achieve higher complexity with dense masks.<n>On ImageNets, it matches the extremal contours of dense masks while producing compact regions with improved run-to-run vision.
arXiv Detail & Related papers (2025-11-03T10:02:21Z)
Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification [20.29420915336209]
Multi-label classification (MLC) offers a more comprehensive semantic understanding of Remote Sensing (RS) imagery.<n>Single-positive multi-label learning (SPML) has emerged, where each image is annotated with only one relevant label, and the model is expected to recover the full set of labels.<n>We propose Adaptive Gradient (AdaGC) as a novel and generalizable SPML framework tailored to RS imagery.
arXiv Detail & Related papers (2025-10-09T14:26:09Z)
BoundMatch: Boundary detection applied to semi-supervised segmentation [12.8995997687175]
Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images.<n>We propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into a teacher-student consistency regularization pipeline.<n>Our core mechanism, Boundary Consistency Regularized Multi-Task Learning, enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries.
arXiv Detail & Related papers (2025-03-30T17:02:26Z)
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo [8.303396507129266]
MSP-MVS is a method introducing multi-granularity segmentation prior to edge-confined patch deformation.<n>We implement equidistribution and disassemble-clustering of correlative reliable pixels.<n>We also introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs.
arXiv Detail & Related papers (2024-07-27T19:00:44Z)
Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach [7.012760526318993]
Weakly-Supervised Semantic (WSSS) offers a cost-efficient workaround to extensive labeling. Existing WSSS methods have difficulties in learning the boundaries of objects leading to poor segmentation results. We propose a novel and effective framework that addresses these issues by leveraging visual foundation models inside the bounding box.
arXiv Detail & Related papers (2024-05-10T16:42:25Z)
Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures. SurfaceFormer is a pioneering Transformer-based mesh denoising framework. New representation known as Local Surface Descriptor captures local geometric intricacies. Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z)
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds. We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z)
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality. The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark. By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z)
The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation [85.153426159438]
We propose Basis based Instance(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods. Our B2Inst leads to consistent improvements and accurately parses out the instance boundaries in a scene.
arXiv Detail & Related papers (2020-11-26T11:26:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.