VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
- URL: http://arxiv.org/abs/2601.10124v1
- Date: Thu, 15 Jan 2026 07:09:00 GMT
- Title: VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
- Authors: Sicheng Yang, Zhaohu Xing, Lei Zhu,
- Abstract summary: We propose VQ-Seg, the first approach to employ vector quantization (VQ) to discretize the feature space and introduce a novel Quantized Perturbation Module (QPM) that replaces dropout.<n>Our QPM perturbs discrete representations by shuffling the spatial locations of codebook indices, enabling effective and controllable regularization.<n>We collect a large-scale Lung Cancer dataset comprising 828 CT scans annotated for central-type lung carcinoma.
- Score: 19.35191098558586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistency learning with feature perturbation is a widely used strategy in semi-supervised medical image segmentation. However, many existing perturbation methods rely on dropout, and thus require a careful manual tuning of the dropout rate, which is a sensitive hyperparameter and often difficult to optimize and may lead to suboptimal regularization. To overcome this limitation, we propose VQ-Seg, the first approach to employ vector quantization (VQ) to discretize the feature space and introduce a novel and controllable Quantized Perturbation Module (QPM) that replaces dropout. Our QPM perturbs discrete representations by shuffling the spatial locations of codebook indices, enabling effective and controllable regularization. To mitigate potential information loss caused by quantization, we design a dual-branch architecture where the post-quantization feature space is shared by both image reconstruction and segmentation tasks. Moreover, we introduce a Post-VQ Feature Adapter (PFA) to incorporate guidance from a foundation model (FM), supplementing the high-level semantic information lost during quantization. Furthermore, we collect a large-scale Lung Cancer (LC) dataset comprising 828 CT scans annotated for central-type lung carcinoma. Extensive experiments on the LC dataset and other public benchmarks demonstrate the effectiveness of our method, which outperforms state-of-the-art approaches. Code available at: https://github.com/script-Yang/VQ-Seg.
Related papers
- Saccadic Vision for Fine-Grained Visual Classification [10.681604440788854]
Fine-grained visual classification (FGVC) requires distinguishing between visually similar categories through subtle, localized features.<n>Existing part-based methods rely on complex localization networks that learn mappings from pixel to sample space.<n>We propose a two-stage process that first extracts peripheral features and generates a sample map.<n>We employ contextualized selective attention to weigh the impact of each fixation patch before fusing peripheral and focus representations.
arXiv Detail & Related papers (2025-09-19T07:03:37Z) - FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z) - FMaMIL: Frequency-Driven Mamba Multi-Instance Learning for Weakly Supervised Lesion Segmentation in Medical Images [24.941922708432212]
We propose FMaMIL, a two-stage framework for weakly supervised lesion segmentation based solely on image-level labels.<n>In the first stage, a lightweight Mamba-based encoder is introduced to capture long-range dependencies across image patches under the MIL paradigm.<n>To enhance spatial sensitivity and structural awareness, we design a learnable frequency-domain encoding module that supplements spatial-domain features with spectrum-based information.<n>In the second stage, we refine the initial pseudo labels via a CAM-guided soft-label supervision and a self-correction mechanism, enabling robust training even under label noise.
arXiv Detail & Related papers (2025-06-09T11:18:02Z) - Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.<n>We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.<n> Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - SpaRG: Sparsely Reconstructed Graphs for Generalizable fMRI Analysis [8.489318619991534]
Deep learning can help uncover patterns in resting-state functional Magnetic Resonance Imaging (rsfMRI) associated with psychiatric disorders and personal traits.
Yet the problem of interpreting deep learning findings is rarely more evident than in fMRI analyses.
We propose a simple approach to mitigate these challenges grounded on sparsification and self-supervision.
arXiv Detail & Related papers (2024-09-24T18:35:57Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Learning Representations for CSI Adaptive Quantization and Feedback [51.14360605938647]
We propose an efficient method for adaptive quantization and feedback in frequency division duplexing systems.
Existing works mainly focus on the implementation of autoencoder (AE) neural networks for CSI compression.
We recommend two different methods: one based on a post training quantization and the second one in which the codebook is found during the training of the AE.
arXiv Detail & Related papers (2022-07-13T08:52:13Z) - Mixed-UNet: Refined Class Activation Mapping for Weakly-Supervised
Semantic Segmentation with Multi-scale Inference [28.409679398886304]
We develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase.
We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets.
arXiv Detail & Related papers (2022-05-06T08:37:02Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z) - DSU-net: Dense SegU-net for automatic head-and-neck tumor segmentation
in MR images [30.747375849126925]
We propose a Dense SegU-net (DSU-net) framework for automatic nasopharyngeal carcinoma (NPC) segmentation in MRI.
To combat the potential vanishing-gradient problem, we introduce dense blocks which can facilitate feature propagation and reuse.
Our proposed architecture outperforms the existing state-of-the-art segmentation networks.
arXiv Detail & Related papers (2020-06-11T09:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.