Segment Any-Quality Images with Generative Latent Space Enhancement
- URL: http://arxiv.org/abs/2503.12507v2
- Date: Fri, 04 Apr 2025 04:47:08 GMT
- Title: Segment Any-Quality Images with Generative Latent Space Enhancement
- Authors: Guangqian Guo, Yong Guo, Xuehui Yu, Wenbo Li, Yaoxing Wang, Shan Gao,
- Abstract summary: We propose GleSAM to boost robustness on low-quality images.<n>We adapt the concept of latent diffusion to SAM-based segmentation frameworks.<n>We also introduce two techniques to improve compatibility between the pre-trained diffusion model and the segmentation framework.
- Score: 23.05638803781018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their success, Segment Anything Models (SAMs) experience significant performance drops on severely degraded, low-quality images, limiting their effectiveness in real-world scenarios. To address this, we propose GleSAM, which utilizes Generative Latent space Enhancement to boost robustness on low-quality images, thus enabling generalization across various image qualities. Specifically, we adapt the concept of latent diffusion to SAM-based segmentation frameworks and perform the generative diffusion process in the latent space of SAM to reconstruct high-quality representation, thereby improving segmentation. Additionally, we introduce two techniques to improve compatibility between the pre-trained diffusion model and the segmentation framework. Our method can be applied to pre-trained SAM and SAM2 with only minimal additional learnable parameters, allowing for efficient optimization. We also construct the LQSeg dataset with a greater diversity of degradation types and levels for training and evaluating the model. Extensive experiments demonstrate that GleSAM significantly improves segmentation robustness on complex degradations while maintaining generalization to clear images. Furthermore, GleSAM also performs well on unseen degradations, underscoring the versatility of our approach and dataset.
Related papers
- Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.
This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.
Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
Multi-modality image fusion plays a crucial role in integrating diverse modalities to enhance scene understanding.<n>Recent approaches have shifted towards task-specific design, but struggle to achieve the The Best of Both Worlds'' due to inconsistent optimization goals.<n>We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Establish downstream task adaptability, namely SAGE.
arXiv Detail & Related papers (2025-03-03T06:16:31Z) - Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.
We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.
The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z) - FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios.<n>We contribute a million-scale dataset with two notable advantages over existing training data.<n>We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z) - Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation.<n>The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z) - RobustSAM: Segment Anything Robustly on Degraded Images [19.767828436963317]
Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation.
We propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality images.
Our method has been shown to effectively improve the performance of SAM-based downstream tasks such as single image dehazing and deblurring.
arXiv Detail & Related papers (2024-06-13T23:33:59Z) - ASAM: Boosting Segment Anything Model with Adversarial Tuning [9.566046692165884]
This paper introduces ASAM, a novel methodology that amplifies a foundation model's performance through adversarial tuning.
We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing.
Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations.
arXiv Detail & Related papers (2024-05-01T00:13:05Z) - SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution [46.85622647257876]
We propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference.<n> Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset.
arXiv Detail & Related papers (2024-02-27T01:57:02Z) - ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment
Anything to SAR Domain for Semantic Segmentation [6.229326337093342]
Segment Anything Model (SAM) excels in various segmentation scenarios relying on semantic information and generalization ability.
The ClassWiseSAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images.
CWSAM showcases enhanced performance with fewer computing resources.
arXiv Detail & Related papers (2024-01-04T15:54:45Z) - Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation [43.759808066264334]
We propose a weakly supervised self-training architecture with anchor regularization and low-rank finetuning to improve the robustness and efficiency of adaptation.
We validate the effectiveness on 5 types of downstream segmentation tasks including natural clean/corrupted images, medical images, camouflaged images and robotic images.
arXiv Detail & Related papers (2023-12-06T13:59:22Z) - EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment
Anything [36.553867358541154]
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications.
We propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity.
Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.
arXiv Detail & Related papers (2023-12-01T18:31:00Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.