LYT-NET: Lightweight YUV Transformer-based Network for Low-light Image Enhancement
- URL: http://arxiv.org/abs/2401.15204v7
- Date: Wed, 10 Sep 2025 12:44:42 GMT
- Title: LYT-NET: Lightweight YUV Transformer-based Network for Low-light Image Enhancement
- Authors: A. Brateanu, R. Balmez, A. Avram, C. Orhei, C. Ancuti,
- Abstract summary: LYT-Net is a novel lightweight transformer-based model for low-light image enhancement (LLIE)<n>In our method we adopt a dual-path approach, treating chrominance channels U and V and luminance channel Y as separate entities to help the model better handle illumination adjustment and corruption restoration.<n>Our comprehensive evaluation on established LLIE datasets demonstrates that, despite its low complexity, our model outperforms recent LLIE methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This letter introduces LYT-Net, a novel lightweight transformer-based model for low-light image enhancement (LLIE). LYT-Net consists of several layers and detachable blocks, including our novel blocks--Channel-Wise Denoiser (CWD) and Multi-Stage Squeeze & Excite Fusion (MSEF)--along with the traditional Transformer block, Multi-Headed Self-Attention (MHSA). In our method we adopt a dual-path approach, treating chrominance channels U and V and luminance channel Y as separate entities to help the model better handle illumination adjustment and corruption restoration. Our comprehensive evaluation on established LLIE datasets demonstrates that, despite its low complexity, our model outperforms recent LLIE methods. The source code and pre-trained models are available at https://github.com/albrateanu/LYT-Net
Related papers
- Revisiting Lightweight Low-Light Image Enhancement: From a YUV Color Space Perspective [17.507319835166406]
We propose a novel YUV-based paradigm that strategically restores channels using a Dual-Stream Global-Local Attention module for the Y channel, a Y-guided Local-Aware Frequency Attention module for the UV channels, and a Guided Interaction module for final feature fusion.<n>Our model establishes a new state-of-the-art on multiple benchmarks, delivering superior visual quality with a significantly lower parameter count.
arXiv Detail & Related papers (2026-01-24T07:27:54Z) - From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion [91.35078719566472]
Vision-Language Models (VLMs) create a severe visual feature bottleneck by using a crude, asymmetric connection.<n>We introduce Cross-Layer Injection (CLI), a novel and lightweight framework that forges a dynamic many-to-many bridge between the two modalities.
arXiv Detail & Related papers (2026-01-15T18:59:10Z) - Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents [55.82787697101274]
Bifrost-1 is a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models.<n>By seamlessly integrating pretrained MLLMs and diffusion models with patch-level CLIP latents, our framework enables high-fidelity controllable image generation.<n>Our experiments demonstrate that Bifrost-1 achieves comparable or better performance than previous methods in terms of visual fidelity and multimodal understanding.
arXiv Detail & Related papers (2025-08-08T02:38:47Z) - Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design [13.587511215001115]
Current Low-light Image Enhancement (LLIE) techniques rely on either direct Low-Light (LL) to Normal-Light (NL) mappings or guidance from semantic features or illumination maps.
We present SG-LLIE, a new multi-scale CNN-Transformer hybrid framework guided by structure priors.
Our solution ranks second in the NTIRE 2025 Low-Light Enhancement Challenge.
arXiv Detail & Related papers (2025-04-18T20:57:16Z) - LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation [52.58791563814837]
Large foundation models trained on large-scale vision-language data can boost Open-Vocabulary Object Detection (OVD)<n>This paper presents a systematic method to enhance visual grounding by utilizing decoder layers of the Large Language Models (LLMs)<n>We find that intermediate LLM layers already encode rich spatial semantics; adapting only the early layers yields most of the gain.
arXiv Detail & Related papers (2025-03-18T00:50:40Z) - LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration [1.049712834719005]
We introduce LTCF-Net, a novel network architecture designed for enhancing low-light images.
Our approach utilizes two color spaces - LAB and YUV - to efficiently separate and process color information.
Our model incorporates the Transformer architecture to comprehensively understand image content.
arXiv Detail & Related papers (2024-11-24T07:21:17Z) - LumiSculpt: A Consistency Lighting Control Network for Video Generation [67.48791242688493]
Lighting plays a pivotal role in ensuring the naturalness of video generation.
It remains challenging to disentangle and model independent and coherent lighting attributes.
LumiSculpt enables precise and consistent lighting control in T2V generation models.
arXiv Detail & Related papers (2024-10-30T12:44:08Z) - GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval [80.96706764868898]
We present a new Low-light Image Enhancement (LLIE) network via Generative LAtent feature based codebook REtrieval (GLARE)
We develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook.
Experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data.
arXiv Detail & Related papers (2024-07-17T09:40:15Z) - Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models [42.891427362223176]
Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities.
We propose a novel framework to fully harness the capabilities of LLMs.
We further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework.
arXiv Detail & Related papers (2024-06-17T17:59:43Z) - Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT [120.39362661689333]
We present an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency.
Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities.
arXiv Detail & Related papers (2024-06-05T17:53:26Z) - Dense Connector for MLLMs [89.50595155217108]
We introduce the Dense Connector - a plug-and-play vision-language connector that significantly enhances existing MLLMs.
Building on this, we also propose the Efficient Dense Connector, which achieves performance comparable to LLaVA-v1.5 with only 25% of the visual tokens.
Our model, trained solely on images, showcases remarkable zero-shot capabilities in video understanding as well.
arXiv Detail & Related papers (2024-05-22T16:25:03Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.
We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.
We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - Passive Non-Line-of-Sight Imaging with Light Transport Modulation [45.992851199035336]
We propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network.
We achieve this by inferring a latent light transport representation from the projection image and using this representation to modulate the network that reconstructs the hidden image from the projection image.
Experiments on a large-scale passive NLOS dataset demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2023-12-26T11:49:23Z) - Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and
Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms.
As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z) - Online Video Super-Resolution with Convolutional Kernel Bypass Graft [42.32318235565591]
We propose an extremely low-latency VSR algorithm based on a novel kernel knowledge transfer method, named convolutional kernel bypass graft (CKBG)
Experiment results show that our proposed method can process online video sequences up to 110 FPS, with very low model complexity and competitive SR performance.
arXiv Detail & Related papers (2022-08-04T05:46:51Z) - LightSAFT: Lightweight Latent Source Aware Frequency Transform for
Source Separation [0.7192233658525915]
LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separation models.
LightSAFT-Net provides a sufficient SDR performance for comparison during the Music Demixing Challenge at ISMIR 2021.
Our enhanced LightSAFT-Net outperforms the previous one with fewer parameters.
arXiv Detail & Related papers (2021-11-24T14:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.