Related papers: ISALux: Illumination and Segmentation Aware Transformer Employing Mixture of Experts for Low Light Image Enhancement

ISALux: Illumination and Segmentation Aware Transformer Employing Mixture of Experts for Low Light Image Enhancement

URL: http://arxiv.org/abs/2508.17885v1
Date: Mon, 25 Aug 2025 10:47:18 GMT
Title: ISALux: Illumination and Segmentation Aware Transformer Employing Mixture of Experts for Low Light Image Enhancement
Authors: Raul Balmez, Alexandru Brateanu, Ciprian Orhei, Codruta Ancuti, Cosmin Ancuti,
Abstract summary: ISALux is a transformer-based approach for Low-Light Image Enhancement (LLIE)<n>HISA-MSA integrates illumination and semantic segmentation maps for feature extraction.<n>MoE-based Feed-Forward Network (FFN) enhances contextual learning.
Score: 39.24835095169737
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce ISALux, a novel transformer-based approach for Low-Light Image Enhancement (LLIE) that seamlessly integrates illumination and semantic priors. Our architecture includes an original self-attention block, Hybrid Illumination and Semantics-Aware Multi-Headed Self- Attention (HISA-MSA), which integrates illumination and semantic segmentation maps for en- hanced feature extraction. ISALux employs two self-attention modules to independently process illumination and semantic features, selectively enriching each other to regulate luminance and high- light structural variations in real-world scenarios. A Mixture of Experts (MoE)-based Feed-Forward Network (FFN) enhances contextual learning, with a gating mechanism conditionally activating the top K experts for specialized processing. To address overfitting in LLIE methods caused by distinct light patterns in benchmarking datasets, we enhance the HISA-MSA module with low-rank matrix adaptations (LoRA). Extensive qualitative and quantitative evaluations across multiple specialized datasets demonstrate that ISALux is competitive with state-of-the-art (SOTA) methods. Addition- ally, an ablation study highlights the contribution of each component in the proposed model. Code will be released upon publication.

Related papers

Adaptive Global and Fine-Grained Perceptual Fusion for MLLM Embeddings Compatible with Hard Negative Amplification [49.109117617514066]
Multimodal embeddings serve as a bridge for aligning vision and language.<n>We propose Adaptive Global and Fine-grained perceptual Fusion for MLLM Embeddings.<n>AGFF-Embed comprehensively achieves state-of-the-art performance in both general and fine-grained understanding.
arXiv Detail & Related papers (2026-02-05T14:52:35Z)
Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement [41.66776033752888]
Most low-light image enhancement methods rely on pre-trained model priors, low-light inputs, or both.<n>We propose VLM-IMI, a novel framework that leverages large vision-language models with iterative and manual instructions.<n>VLM-IMI incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration.
arXiv Detail & Related papers (2025-07-24T03:35:20Z)
SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement [58.79901582809091]
Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>We present a Spatially-Adaptive Illumination-Guided Transformer framework that enables accurate illumination restoration.
arXiv Detail & Related papers (2025-07-21T11:38:56Z)
Low-Light Enhancement via Encoder-Decoder Network with Illumination Guidance [0.0]
This paper introduces a novel deep learning framework for low-light image enhancement, named the.<n>the-Decoder Network with Illumination Guidance (EDNIG)<n>EDNIG integrates an illumination map, derived from Bright Channel Prior (BCP), as a guidance input.<n>It is optimized within a Generative Adversarial Network (GAN) framework using a composite loss function that combines adversarial loss, pixel-wise mean squared error (MSE), and perceptual loss.
arXiv Detail & Related papers (2025-07-04T09:35:00Z)
Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs [56.76586846269894]
Multimodal Large Language Models (MLLMs) have achieved success across various domains.<n>Despite its importance, the study of knowledge sharing among domain-specific MLLMs remains largely underexplored.<n>We propose a unified parameter integration framework that enables modular composition of expert capabilities.
arXiv Detail & Related papers (2025-06-30T15:07:41Z)
ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement [10.957431540794836]
Inadequate illumination can lead to significant information loss and poor image quality, impacting various applications such as surveillance.<n>Current enhancement techniques often use specific datasets to enhance low-light images, but still present challenges when adapting to diverse real-world conditions.<n>The Adaptive Light Enhancement Network (ALEN) is introduced, whose main approach is the use of a classification mechanism to determine whether local or global illumination enhancement is required.
arXiv Detail & Related papers (2024-07-29T05:19:23Z)
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation [2.491548070992611]
novel multi-modal fusion approach called CSK-Net is proposed. It uses a contrastive learning-based spectral knowledge distillation technique. Experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities.
arXiv Detail & Related papers (2023-12-04T10:27:09Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement [69.47143451986067]
Low-light image enhancement (LLIE) investigates how to improve illumination and produce normal-light images. The majority of existing methods improve low-light images via a global and uniform manner, without taking into account the semantic information of different regions. We propose a novel semantic-aware knowledge-guided framework that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.
arXiv Detail & Related papers (2023-04-14T10:22:28Z)
Adaptive Multiscale Illumination-Invariant Feature Representation for Undersampled Face Recognition [29.002873450422083]
This paper presents an illumination-invariant feature representation approach used to eliminate the varying illumination affection in undersampled face recognition. A new illumination level classification technique based on Singular Value Decomposition (SVD) is proposed to judge the illumination level of input image. The experimental results demonstrate that the JLEF-feature and AJLEF-face outperform other related approaches for undersampled face recognition under varying illumination.
arXiv Detail & Related papers (2020-04-07T06:48:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.