Related papers: LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference

LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference

URL: http://arxiv.org/abs/2601.03332v1
Date: Tue, 06 Jan 2026 18:00:45 GMT
Title: LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference
Authors: Oleksandr Kuznetsov,
Abstract summary: This paper introduces LUT-KAN, a segment-wise lookup-table (LUT) compilation and quantization method for PyKAN-style KAN layers.<n>LUT-KAN converts each edge function into a per-segment LUT with affine int8/uint8 quantization and linear.<n>We report accuracy, speed, and memory metrics with mean and standard deviation across multiple seeds.
Score: 20.271194684947282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov--Arnold Networks (KAN) replace scalar weights by learnable univariate functions, often implemented with B-splines. This design can be accurate and interpretable, but it makes inference expensive on CPU because each layer requires many spline evaluations. Standard quantization toolchains are also hard to apply because the main computation is not a matrix multiply but repeated spline basis evaluation. This paper introduces LUT-KAN, a segment-wise lookup-table (LUT) compilation and quantization method for PyKAN-style KAN layers. LUT-KAN converts each edge function into a per-segment LUT with affine int8/uint8 quantization and linear interpolation. The method provides an explicit and reproducible inference contract, including boundary conventions and out-of-bounds (OOB) policies. We propose an ``honest baseline'' methodology for speed evaluation: B-spline evaluation and LUT evaluation are compared under the same backend optimization (NumPy vs NumPy and Numba vs Numba), which separates representation gains from vectorization and JIT effects. Experiments include controlled sweeps over LUT resolution L in 16, 32, 64, 128 and two quantization schemes (symmetric int8 and asymmetric uint8). We report accuracy, speed, and memory metrics with mean and standard deviation across multiple seeds. A two-by-two OOB robustness matrix evaluates behavior under different boundary modes and OOB policies. In a case study, we compile a trained KAN model for DoS attack detection (CICIDS2017 pipeline) into LUT artifacts. The compiled model preserves classification quality (F1 drop below 0.0002) while reducing steady-state CPU inference latency by 12x under NumPy and 10x under Numba backends (honest baseline). The memory overhead is approximately 10x at L=64. All code and artifacts are publicly available with fixed release tags for reproducibility.

Related papers

Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing [0.0]
We propose an iterative framework based on Quadratic Un Binary Optimization (QUBO) for training the head of convolutional neural networks (CNNs)<n>A per-output decomposition splits the $C$-class problem into $C$ independent QUBOs, each with $(d+1)K$ binary variables, where $d$ is the feature dimension and $K$ is the bit precision.<n>We evaluate the method on six image-classification benchmarks (sklearn digits, MNIST, Fashion-MNIST, CIFAR-10, EMNIST, KMNIST)
arXiv Detail & Related papers (2026-03-03T13:10:36Z)
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices [13.483546044414581]
Large language models (LLMs) are increasingly deployed on edge devices.<n>LUT-based inference underutilizes memory bandwidth during parallel inference.<n>Vec-LUT outperforms baselines by up to $4.2times$.
arXiv Detail & Related papers (2025-12-06T14:14:01Z)
PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants [10.239332579225522]
Kolmogorov-Arnold Networks (KANs) promise higher expressive capability and stronger interpretability than Multi-Layer Perceptron.<n>We present a GPU-accelerated operator library, named PolyKAN, which is the first general open-source implementation of KAN and its variants.
arXiv Detail & Related papers (2025-11-18T19:05:16Z)
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression [57.54335545892155]
We introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook.<n>Our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines.
arXiv Detail & Related papers (2025-10-23T20:19:48Z)
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models [49.397861654088636]
We propose a two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces.<n>We show that our strategy achieves faster runtime and reduced memory usage by up to $25%$ across different model sizes.
arXiv Detail & Related papers (2025-05-23T14:37:00Z)
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator [11.167930856636161]
We introduce LUT-DLA, a Look-Up Table (LUT) Deep Learning Accelerator Framework that utilizes vector quantization to convert neural network models into LUTs.<n>We show that LUT-DLA achieves improvements in power efficiency and area efficiency with gains of $1.4$$7.0times$ and $1.5$$146.1times$, respectively.
arXiv Detail & Related papers (2025-01-18T05:27:25Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [63.118592279833656]
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs)<n>We propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise.<n> Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models [41.99844472131922]
Decoding the original signal from the noisy observations is one of the main goals in nearly all HMM based data analyses.<n>We present QATS, a divide-and-conquer algorithm with computational polylogarithmic complexity in the length of the sequence, and cubic in the size of the state space.<n>An implementation of QATS is in the R-package QATS on GitHub.
arXiv Detail & Related papers (2023-05-29T19:37:48Z)
Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO) TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models. Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z)
Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training. We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer. We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z)
8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values. This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters. In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.