Fugu-MT 論文翻訳(概要): KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference

論文の概要: KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference

arxiv url: http://arxiv.org/abs/2603.17230v1
Date: Wed, 18 Mar 2026 00:32:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.455207
Title: KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference
Title（参考訳）: Kantize:効率的な推論のためのKolmogorov-Arnoldネットワークの低ビット量子化探索
Authors: Sohaib Errabii, Olivier Sentieys, Marcello Traiola,
Abstract要約: Kolmogorov-Arnold Networks (KANs) は、MLP(Multi-Layer Perceptrons)を上回る可能性について注目されている。本研究では,低ビット量子化がkanに与える影響と,計算複雑性とハードウェア効率に与える影響について検討する。
参考スコア（独自算出の注目度）: 1.5102260054654923
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Kolmogorov-Arnold Networks (KANs) have gained attention for their potential to outperform Multi-Layer Perceptrons (MLPs) in terms of parameter efficiency and interpretability. Unlike traditional MLPs, KANs use learnable non-linear activation functions, typically spline functions, expressed as linear combinations of basis splines (B-splines). B-spline coefficients serve as the model's learnable parameters. However, evaluating these spline functions increases computational complexity during inference. Conventional quantization reduces this complexity by lowering the numerical precision of parameters and activations. However, the impact of quantization on KANs, and especially its effectiveness in reducing computational complexity, is largely unexplored, particularly for quantization levels below 8 bits. The study investigates the impact of low-bit quantization on KANs and its impact on computational complexity and hardware efficiency. Results show that B-splines can be quantized to 2-3 bits with negligible loss in accuracy, significantly reducing computational complexity. Hence, we investigate the potential of using low-bit quantized precomputed tables as a replacement for the recursive B-spline algorithm. This approach aims to further reduce the computational complexity of KANs and enhance hardware efficiency while maintaining accuracy. For example, ResKAN18 achieves a 50x reduction in BitOps without loss of accuracy using low-bit-quantized B-spline tables. Additionally, precomputed 8-bit lookup tables improve GPU inference speedup by up to 2.9x, while on FPGA-based systolic-array accelerators, reducing B-spline table precision from 8 to 3 bits cuts resource usage by 36%, increases clock frequency by 50%, and enhances speedup by 1.24x. On a 28nm FD-SOI ASIC, reducing the B-spline bit-width from 16 to 3 bits achieves 72% area reduction and 50% higher maximum frequency.
Abstract（参考訳）: Kolmogorov-Arnold Networks (KANs) は、パラメータ効率と解釈可能性の観点から、MLP(Multi-Layer Perceptrons)を上回る可能性を注目されている。従来のMLPとは異なり、カンは学習可能な非線形アクティベーション関数(通常はスプライン関数)を使用し、ベーススプライン(B-スプライン)の線形結合として表される。 B-スプライン係数はモデルの学習可能なパラメータとして機能する。しかし、これらのスプライン関数の評価は、推論中の計算複雑性を増大させる。従来の量子化は、パラメータとアクティベーションの数値的精度を下げることで、この複雑さを減少させる。しかし、量子化の影響、特に計算複雑性の低減における有効性は、特に8ビット未満の量子化レベルでは、ほとんど解明されていない。本研究では,低ビット量子化がkanに与える影響と,計算複雑性とハードウェア効率に与える影響について検討した。その結果,B-スプラインの量子化は2～3ビットに抑えられ,計算複雑性を著しく低減できることがわかった。そこで, 再帰的B-スプラインアルゴリズムの代替として, 低ビット量子化事前計算テーブルを用いる可能性を検討する。このアプローチは、kanの計算複雑性をさらに減らし、精度を維持しながらハードウェア効率を向上させることを目的としている。例えば、ResKAN18はビットOpsの50倍の精度を低ビット量子化Bスプラインテーブルで達成している。さらに、プリ計算された8ビットルックアップテーブルはGPU推論のスピードアップを最大2.9倍改善し、FPGAベースのシストリックアレイアクセラレーターでは、Bスプラインテーブルの精度を8ビットから3ビットに削減し、リソース使用量を36%削減し、クロック周波数を50%増加させ、1.24倍高速化する。 28nmのFD-SOI ASICでは、Bスプラインのビット幅を16ビットから3ビットに減らし、面積を72%減らし、最大周波数を50%高める。

論文の概要: KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference

関連論文リスト