AttentionLut: Attention Fusion-based Canonical Polyadic LUT for
Real-time Image Enhancement
- URL: http://arxiv.org/abs/2401.01569v1
- Date: Wed, 3 Jan 2024 06:55:06 GMT
- Title: AttentionLut: Attention Fusion-based Canonical Polyadic LUT for
Real-time Image Enhancement
- Authors: Kang Fu, Yicong Peng, Zicheng Zhang, Qihang Xu, Xiaohong Liu, Jia
Wang, Guangtao Zhai
- Abstract summary: We propose a novel framework named AttentionLut for real-time image enhancement.
Our proposed framework consists of three lightweight modules.
Experiments on the benchmark MIT-Adobe FiveK dataset demonstrate that the proposed method achieves better enhancement performance than the state-of-the-art methods.
- Score: 38.61183657952919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, many algorithms have employed image-adaptive lookup tables (LUTs)
to achieve real-time image enhancement. Nonetheless, a prevailing trend among
existing methods has been the employment of linear combinations of basic LUTs
to formulate image-adaptive LUTs, which limits the generalization ability of
these methods. To address this limitation, we propose a novel framework named
AttentionLut for real-time image enhancement, which utilizes the attention
mechanism to generate image-adaptive LUTs. Our proposed framework consists of
three lightweight modules. We begin by employing the global image context
feature module to extract image-adaptive features. Subsequently, the attention
fusion module integrates the image feature with the priori attention feature
obtained during training to generate image-adaptive canonical polyadic tensors.
Finally, the canonical polyadic reconstruction module is deployed to
reconstruct image-adaptive residual 3DLUT, which is subsequently utilized for
enhancing input images. Experiments on the benchmark MIT-Adobe FiveK dataset
demonstrate that the proposed method achieves better enhancement performance
quantitatively and qualitatively than the state-of-the-art methods.
Related papers
- LoR-LUT: Learning Compact 3D Lookup Tables via Low-Rank Residuals [8.420640298306237]
LoR-LUT is a unified low-rank formulation for compact and interpretable 3D lookup table (LUT) generation.<n>LoR-LUT is trained on the MIT-Adobe FiveK dataset.<n> interactive visualization tool, termed LoR-LUT Viewer, transforms an input image into the LUT-adjusted output image.
arXiv Detail & Related papers (2026-02-26T04:28:35Z) - Text-Guided Channel Perturbation and Pretrained Knowledge Integration for Unified Multi-Modality Image Fusion [5.5275479200431406]
Unified models aim to share parameters across modalities for multi-modality image fusion.<n>Large modality differences often cause gradient conflicts, limiting performance.<n>We propose a unified multi-modality image fusion framework based on channel perturbation and pre-trained knowledge integration.
arXiv Detail & Related papers (2025-11-16T03:22:33Z) - High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network [73.19214585791268]
This paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction.<n>Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images.<n>LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces, with 4K resolution images processed in just 13 ms on a single GPU.
arXiv Detail & Related papers (2025-10-13T16:52:32Z) - Dynamic Classifier-Free Diffusion Guidance via Online Feedback [53.54876309092376]
"One-size-all" approach fails to adapt to the diverse requirements of different prompts.<n>We introduce a framework for dynamic CFG scheduling.<n>We demonstrate the effectiveness of our approach on both small-scale models and the state-of-the-art Imagen 3.
arXiv Detail & Related papers (2025-09-19T16:27:19Z) - G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration [57.67450930037339]
We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction.<n>Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions.
arXiv Detail & Related papers (2025-08-15T10:25:58Z) - Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation [66.73899356886652]
We build an image tokenizer directly atop pre-trained vision foundation models.<n>Our proposed image tokenizer, VFMTok, achieves substantial improvements in image reconstruction and generation quality.<n>It further boosts autoregressive (AR) generation -- achieving a gFID of 2.07 on ImageNet benchmarks.
arXiv Detail & Related papers (2025-07-11T09:32:45Z) - Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement [1.0428401220897083]
We propose a novel, unsupervised, fuzzy-inspired image enhancement framework guided by NSGA-II algorithm.<n>We use a GPU-accelerated NSGA-II algorithm that balances multiple objectives, namely, increasing image entropy, improving perceptual similarity, and maintaining appropriate brightness.<n>Our model achieves excellent performance with average BRISQUE and NIQE scores of 19.82 and 3.652, respectively, in all unpaired datasets.
arXiv Detail & Related papers (2025-05-16T13:40:56Z) - Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction [13.836238771024254]
We show that the nonlinear characteristics of domain-specific image dynamics cannot be addressed by simple linear transforms.
We propose a method termed Global Deep Curve Estimation to reduce domain-specific mismatch exposure.
arXiv Detail & Related papers (2025-04-14T10:24:57Z) - SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors [22.561786156613525]
We propose SparseGS-W, a novel framework to Synthesizing novel views of large-scale scenes from unconstrained in-the-wild images.
We leverage geometric priors and constrained diffusion priors to compensate for the lack of multi-view information from extremely sparse input.
SparseGS-W achieves state-of-the-art performance not only in full-reference metrics, but also in commonly used non-reference metrics such as FID, ClipIQA, and MUSIQ.
arXiv Detail & Related papers (2025-03-25T08:40:40Z) - Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.
This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.
Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations? [55.99654128127689]
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations.
Existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process.
We propose a new framework, namely CMCR, to address these shortcomings.
arXiv Detail & Related papers (2024-12-12T06:09:49Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing [20.189124622271446]
Painterly Image Harmonization aims at seamlessly blending disparate visual elements within a single coherent image.
Previous approaches often encounter significant limitations due to training data constraints, the need for time-consuming fine-tuning, or reliance on additional prompts.
We design a Training-and-prompt-Free General Painterly Harmonization method using image-wise attention sharing.
arXiv Detail & Related papers (2024-04-19T14:13:46Z) - Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation [32.69740459810521]
We present the IREANet, which improves the multiple exposure and aggregation with a Flow-guide Feature Alignment Module (FFAM) and an Enhanced Feature Aggregation Module (EFAM)
Our experimental evaluations demonstrate that the proposed IREANet shows state-of-the-art performance compared with previous methods.
arXiv Detail & Related papers (2024-04-16T07:46:55Z) - CFAT: Unleashing TriangularWindows for Image Super-resolution [5.130320840059732]
Transformer-based models have revolutionized the field of image super-resolution (SR)
We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion.
Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
arXiv Detail & Related papers (2024-03-24T13:31:31Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining.
With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.