Vision-Language Controlled Deep Unfolding for Joint Medical Image Restoration and Segmentation
- URL: http://arxiv.org/abs/2601.23103v1
- Date: Fri, 30 Jan 2026 15:48:35 GMT
- Title: Vision-Language Controlled Deep Unfolding for Joint Medical Image Restoration and Segmentation
- Authors: Ping Chen, Zicheng Huang, Xiangming Wang, Yungeng Liu, Bingyu Liang, Haijin Zeng, Yongyong Chen,
- Abstract summary: We propose a principled framework for joint All-in-One Medical Image Restoration and (AiOMIRS)<n>We introduce a frequency-aware Mamba mechanism to capture long-range dependencies for global segmentation while preserving the high-frequency textures necessary for restoration.<n>As a pioneering work in the AiOMIRS task, VL-DUN establishes a new state-of-the-art across multi-modal benchmarks, improving PSNR by 0.92 dB and the Dice coefficient by 9.76%.
- Score: 34.04441838578788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose VL-DUN, a principled framework for joint All-in-One Medical Image Restoration and Segmentation (AiOMIRS) that bridges the gap between low-level signal recovery and high-level semantic understanding. While standard pipelines treat these tasks in isolation, our core insight is that they are fundamentally synergistic: restoration provides clean anatomical structures to improve segmentation, while semantic priors regularize the restoration process. VL-DUN resolves the sub-optimality of sequential processing through two primary innovations. (1) We formulate AiOMIRS as a unified optimization problem, deriving an interpretable joint unfolding mechanism where restoration and segmentation are mathematically coupled for mutual refinement. (2) We introduce a frequency-aware Mamba mechanism to capture long-range dependencies for global segmentation while preserving the high-frequency textures necessary for restoration. This allows for efficient global context modeling with linear complexity, effectively mitigating the spectral bias of standard architectures. As a pioneering work in the AiOMIRS task, VL-DUN establishes a new state-of-the-art across multi-modal benchmarks, improving PSNR by 0.92 dB and the Dice coefficient by 9.76\%. Our results demonstrate that joint collaborative learning offers a superior, more robust solution for complex clinical workflows compared to isolated task processing. The codes are provided in https://github.com/cipi666/VLDUN.
Related papers
- UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation [98.93314262366681]
We present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation.<n>UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation.<n>On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance and a 24.2% gain in generation quality.
arXiv Detail & Related papers (2026-01-16T18:59:58Z) - Frequency Error-Guided Under-sampling Optimization for Multi-Contrast MRI Reconstruction [24.246450246745905]
Multi-contrast MRI reconstruction has emerged as a promising direction by leveraging complementary information from fully-sampled reference scans.<n>Existing approaches suffer from three major limitations: (1) superficial reference fusion strategies, (2) insufficient utilization of the complementary information provided by the reference contrast, and (3) fixed under-sampling patterns.<n>We propose an efficient and interpretable frequency error-guided reconstruction framework to tackle these issues.
arXiv Detail & Related papers (2026-01-14T09:40:34Z) - ResDynUNet++: A nested U-Net with residual dynamic convolution blocks for dual-spectral CT [5.812239137446292]
We propose a hybrid reconstruction framework for dual-spectral CT (DSCT) that integrates iterative methods with deep learning models.<n>In the knowledge-driven phase, we employ the oblique projection modification technique (OPMT) to reconstruct an intermediate solution of the basis material images from the projection data.<n>In the data-driven phase, we introduce a novel neural network, ResDynUNet++, to refine this intermediate solution.
arXiv Detail & Related papers (2025-12-18T03:52:18Z) - Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation [104.59740403500132]
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance.<n>We propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC)<n>Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels.
arXiv Detail & Related papers (2025-09-19T17:29:25Z) - MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z) - GENRE-CMR: Generalizable Deep Learning for Diverse Multi-Domain Cardiac MRI Reconstruction [0.8749675983608171]
We propose GENRE-CMR, a generative adversarial network (GAN)-based architecture to enhance reconstruction fidelity and generalization.<n>Experiments confirm that GENRE-CMR surpasses state-of-the-art methods on training and unseen data, achieving 0.9552 SSIM and 38.90 dB PSNR on unseen distributions.<n>Our framework presents a unified and robust solution for high-quality CMR reconstruction, paving the way for clinically adaptable deployment across heterogeneous acquisition protocols.
arXiv Detail & Related papers (2025-08-28T09:43:59Z) - Decoupling Multi-Contrast Super-Resolution: Pairing Unpaired Synthesis with Implicit Representations [6.255537948555454]
Multi-Contrast Super-Resolution techniques can boost the quality of their low-resolution counterparts.<n>Existing MCSR methods often assume fixed resolution settings and all require large, perfectly paired training datasets.<n>We propose a novel Modular Multi-Contrast Super-Resolution framework that eliminates the need for paired training data and supports arbitrary upscaling.
arXiv Detail & Related papers (2025-05-09T07:48:52Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction [16.23059795712369]
Multi-modal Magnetic Resonance Imaging (MRI) offers complementary diagnostic information, but some modalities are limited by the long scanning time.<n>To accelerate the whole acquisition process, MRI reconstruction of one modality from highly undersampled k-space data with another fully-sampled reference modality is an efficient solution.<n>Existing deep learning-based methods that account for inter-modality misalignment perform better, but still share two main common limitations.
arXiv Detail & Related papers (2023-12-28T13:02:16Z) - InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal
Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded.
We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.