Related papers: Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

URL: http://arxiv.org/abs/2312.06101v2
Date: Wed, 8 May 2024 12:36:49 GMT
Title: Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution
Authors: Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong,
Abstract summary: Super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models to amenable to on-chip cache.
Score: 7.403264755337134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.

Related papers

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference [2.7086888205833968]
Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications. Existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width. We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units.
arXiv Detail & Related papers (2025-04-01T09:52:38Z)
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables [60.95483707212802]
DnLUT is an ultra-efficient lookup table-based framework that achieves high-quality color image denoising with minimal resource consumption. Our key innovation lies in two complementary components: a Pairwise Channel Mixer (PCM) that effectively captures inter-channel correlations and spatial dependencies in parallel, and a novel L-shaped convolution design that maximizes receptive field coverage. By converting these components into optimized lookup tables post-training, DnLUT achieves remarkable efficiency - requiring only 500KB storage and 0.1% energy consumption compared to its CNN contestant DnCNN, while delivering 20X faster inference.
arXiv Detail & Related papers (2025-03-20T08:15:29Z)
Taming Lookup Tables for Efficient Image Retouching [30.48643578900116]
We propose ICELUT, which adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN) ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution.
arXiv Detail & Related papers (2024-03-28T08:49:35Z)
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z)
Toward DNN of LUTs: Learning Efficient Image Restoration with Multiple Look-Up Tables [47.15181829317732]
High-definition screens on edge devices stimulate a strong demand for efficient image restoration algorithms. The size of a single look-up table grows exponentially with the increase of its indexing capacity. We propose a universal method to construct multiple LUTs like a neural network, termed MuLUT.
arXiv Detail & Related papers (2023-03-25T16:00:33Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
Exploiting Kernel Compression on BNNs [0.0]
In this work, we observe that the number of unique sequences representing a set of weights is typically low. We propose a clustering scheme to identify the most common sequences of bits and replace the less common ones with some similar common sequences. Our experimental results show that our technique can reduce memory requirement by 1.32x and improve performance by 1.35x.
arXiv Detail & Related papers (2022-12-01T16:05:10Z)
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent. We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z)
CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs [1.0635248457021496]
We present CREW, a hardware accelerator that implements Reuse and an Efficient Weight Storage mechanism. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator.
arXiv Detail & Related papers (2021-07-20T11:10:54Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
PoET-BiN: Power Efficient Tiny Binary Neurons [1.7274221736253095]
We propose PoET-BiN, a Look-Up Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses.
arXiv Detail & Related papers (2020-02-23T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.