Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
- URL: http://arxiv.org/abs/2508.16121v1
- Date: Fri, 22 Aug 2025 06:28:24 GMT
- Title: Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
- Authors: Wontae Kim, Keuntek Lee, Nam Ik Cho,
- Abstract summary: Image enhancement methods based on 3D lookup tables (3D LUTs) efficiently reduce both model size and runtime.<n>However, the 3D LUT methods have a limitation due to their lack of spatial information.<n>We propose a method for generating image-adaptive LUTs by focusing on the redundant parts of the tables.
- Score: 22.15777751379876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The image enhancement methods based on 3D lookup tables (3D LUTs) efficiently reduce both model size and runtime by interpolating pre-calculated values at the vertices. However, the 3D LUT methods have a limitation due to their lack of spatial information, as they convert color values on a point-by-point basis. Although spatial-aware 3D LUT methods address this limitation, they introduce additional modules that require a substantial number of parameters, leading to increased runtime as image resolution increases. To address this issue, we propose a method for generating image-adaptive LUTs by focusing on the redundant parts of the tables. Our efficient framework decomposes a 3D LUT into a linear sum of low-dimensional LUTs and employs singular value decomposition (SVD). Furthermore, we enhance the modules for spatial feature fusion to be more cache-efficient. Extensive experimental results demonstrate that our model effectively decreases both the number of parameters and runtime while maintaining spatial awareness and performance.
Related papers
- LoR-LUT: Learning Compact 3D Lookup Tables via Low-Rank Residuals [8.420640298306237]
LoR-LUT is a unified low-rank formulation for compact and interpretable 3D lookup table (LUT) generation.<n>LoR-LUT is trained on the MIT-Adobe FiveK dataset.<n> interactive visualization tool, termed LoR-LUT Viewer, transforms an input image into the LUT-adjusted output image.
arXiv Detail & Related papers (2026-02-26T04:28:35Z) - Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image [68.55613894952177]
We introduce textbfWonder3D++, a novel method for efficiently generating high-fidelity textured meshes from single-view images.<n>We propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images.<n> Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-11-03T17:24:18Z) - FlowLUT: Efficient Image Enhancement via Differentiable LUTs and Iterative Flow Matching [10.213645938731338]
FlowLUT is a novel end-to-end model that integrates the efficiency of LUTs, multiple priors, and the parameter-independent characteristic of flow-matched reconstructed images.<n>A lightweight fusion prediction network runs on multiple 3D LUTs, with $mathcalO(1)$ complexity for scene-adaptive color correction.<n>The entire model is jointly optimized under a composite loss function enforcing perceptual and structural fidelity.
arXiv Detail & Related papers (2025-09-28T03:22:01Z) - LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering [68.93333348474988]
We present a novel level-of-detail (LOD) method for 3D Gaussian Splatting on memory-constrained devices.<n>Our approach iteratively selects optimal subsets of Gaussians based on camera distance.<n>Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets.
arXiv Detail & Related papers (2025-05-29T06:50:57Z) - HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection [12.941263635455915]
Most previous 3D object detection methods utilize the Bird's Eye View (BEV) space for intermediate feature representation.<n>This paper focuses on the sparse nature of LiDAR point cloud data.<n>We introduce a novel sparse voxel-based transformer network for 3D object detection, dubbed as SparseVoxFormer.
arXiv Detail & Related papers (2025-03-11T06:52:25Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting.
Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z) - HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and
Parameter-Efficient 3D Image Segmentation [4.48473804240016]
We introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention.
We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude.
When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
arXiv Detail & Related papers (2023-10-05T18:44:41Z) - Variable Radiance Field for Real-World Category-Specific Reconstruction from Single Image [25.44715538841181]
Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task.<n>We propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects.<n>VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image
Enhancement [21.963622337032344]
We present SepLUT (separable image-adaptive lookup table) to tackle the above limitations.
Specifically, we separate a single color transform into a cascade of component-independent and component-correlated sub-transforms instantiated as 1D and 3D LUTs.
In this way, the capabilities of two sub-transforms can facilitate each other, where the 3D LUT complements the ability to mix up color components.
arXiv Detail & Related papers (2022-07-18T02:27:19Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.