Related papers: DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning

DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning

URL: http://arxiv.org/abs/2504.19127v1
Date: Sun, 27 Apr 2025 06:56:07 GMT
Title: DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning
Authors: Jialang Lu, Huayu Zhao, Huiyu Zhai, Xingxing Yang, Shini Han,
Abstract summary: We propose a new deep semantic prior-guided framework (DeepSPG) based on Retinex image decomposition for low-light image enhancement (LLIE)<n>Our proposed DeepSPG demonstrates superior performance compared to state-of-the-art methods across five benchmark datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has long been a belief that high-level semantics learning can benefit various downstream computer vision tasks. However, in the low-light image enhancement (LLIE) community, existing methods learn a brutal mapping between low-light and normal-light domains without considering the semantic information of different regions, especially in those extremely dark regions that suffer from severe information loss. To address this issue, we propose a new deep semantic prior-guided framework (DeepSPG) based on Retinex image decomposition for LLIE to explore informative semantic knowledge via a pre-trained semantic segmentation model and multimodal learning. Notably, we incorporate both image-level semantic prior and text-level semantic prior and thus formulate a multimodal learning framework with combinatorial deep semantic prior guidance for LLIE. Specifically, we incorporate semantic knowledge to guide the enhancement process via three designs: an image-level semantic prior guidance by leveraging hierarchical semantic features from a pre-trained semantic segmentation model; a text-level semantic prior guidance by integrating natural language semantic constraints via a pre-trained vision-language model; a multi-scale semantic-aware structure that facilitates effective semantic feature incorporation. Eventually, our proposed DeepSPG demonstrates superior performance compared to state-of-the-art methods across five benchmark datasets. The implementation details and code are publicly available at https://github.com/Wenyuzhy/DeepSPG.

Related papers

Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling [42.46176089721314]
Large Vision and Language Models (LVLMs) have shown strong performance across various vision-language tasks in natural image domains.<n>Their application to remote sensing (RS) remains underexplored due to significant domain differences in visual appearances, object scales, and semantics.<n>We propose a novel LVLM framework tailored for RS understanding, incorporating two core components: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling.
arXiv Detail & Related papers (2025-06-27T02:31:37Z)
SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation [9.311853182451289]
We propose a language-guided Semi-supervised Domain Adaptation (SSDA) setting for semantic segmentation.<n>We harness the semantic generalization capabilities inherent in vision-language models (VLMs) to establish a synergistic framework.<n>Our approach demonstrates substantial performance improvements over contemporary state-of-the-art (SoTA) methodologies.
arXiv Detail & Related papers (2025-04-08T19:14:34Z)
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification [52.405499816861635]
Multiple instance learning (MIL)-based framework has become the mainstream for processing the whole slide image (WSI)<n>We propose a dual-scale vision-language multiple instance learning (ViLa-MIL) framework for whole slide image classification.
arXiv Detail & Related papers (2025-02-12T13:28:46Z)
Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information [10.77139542242678]
Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags. Most current WSSS methods focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information.
arXiv Detail & Related papers (2024-05-08T09:35:26Z)
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement [69.47143451986067]
Low-light image enhancement (LLIE) investigates how to improve illumination and produce normal-light images. The majority of existing methods improve low-light images via a global and uniform manner, without taking into account the semantic information of different regions. We propose a novel semantic-aware knowledge-guided framework that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.
arXiv Detail & Related papers (2023-04-14T10:22:28Z)
Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification [15.842081807249416]
It is necessary to explore the effectiveness of linguistic mode in assisting hyperspectral image classification. Large-scale pre-training image-text foundation models have demonstrated great performance in a variety of downstream applications. A Language-aware Domain Generalization Network (LDGnet) is proposed to learn cross-domain invariant representation.
arXiv Detail & Related papers (2022-09-06T10:06:10Z)
Boosting Video-Text Retrieval with Explicit High-Level Semantics [115.66219386097295]
We propose a novel visual-linguistic aligning model named HiSE for VTR. It improves the cross-modal representation by incorporating explicit high-level semantics. Our method achieves the superior performance over state-of-the-art methods on three benchmark datasets.
arXiv Detail & Related papers (2022-08-08T15:39:54Z)
Fine-Grained Semantically Aligned Vision-Language Pre-Training [151.7372197904064]
Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and texts. We introduce LO, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions.
arXiv Detail & Related papers (2022-08-04T07:51:48Z)
CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS) CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation [39.845944724079814]
Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input. However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability. We propose a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations.
arXiv Detail & Related papers (2020-12-15T02:24:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.