Related papers: MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network

MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network

URL: http://arxiv.org/abs/2510.19105v1
Date: Tue, 21 Oct 2025 21:58:15 GMT
Title: MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network
Authors: Matthew Raffel, Adwaith Renjith, Lizhong Chen,
Abstract summary: Kolmogorov-Arnold Networks (KANs) replace scalar weights with per-edge vectors of basis coefficients.<n>We propose MetaCluster, a framework that makes KANs highly compressible without sacrificing accuracy.
Score: 8.780976521229741
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kolmogorov-Arnold Networks (KANs) replace scalar weights with per-edge vectors of basis coefficients, thereby boosting expressivity and accuracy but at the same time resulting in a multiplicative increase in parameters and memory. We propose MetaCluster, a framework that makes KANs highly compressible without sacrificing accuracy. Specifically, a lightweight meta-learner, trained jointly with the KAN, is used to map low-dimensional embedding to coefficient vectors, shaping them to lie on a low-dimensional manifold that is amenable to clustering. We then run K-means in coefficient space and replace per-edge vectors with shared centroids. Afterwards, the meta-learner can be discarded, and a brief fine-tuning of the centroid codebook recovers any residual accuracy loss. The resulting model stores only a small codebook and per-edge indices, exploiting the vector nature of KAN parameters to amortize storage across multiple coefficients. On MNIST, CIFAR-10, and CIFAR-100, across standard KANs and ConvKANs using multiple basis functions, MetaCluster achieves a reduction of up to 80$\times$ in parameter storage, with no loss in accuracy. Code will be released upon publication.

Related papers

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression [5.280540253822294]
Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD)<n>We propose COMPOT, a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization.<n> COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines.
arXiv Detail & Related papers (2026-02-16T21:31:34Z)
Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM Compression [11.908793753919745]
We propose textbfZero Sum SVD (textbfZS-SVD), a post-training method that performs singular component selection in whitened coordinates.<n>textbfZS-SVD prunes components across the whole model with a textbfzero sum rule that keeps the cumulative predicted loss change near zero.<n>Experiments show consistent gains across diverse benchmarks and compression ratios.
arXiv Detail & Related papers (2026-02-02T21:51:01Z)
MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts [0.0]
Large Language Models (LLMs) are predominantly deployed as dense transformers, where every parameter in every feed-forward block is activated for every token.<n>Recent upcycling methods such as MoEfication, CMoE, ToMoE, and MoORE reveal that much of the useful computation lives in sparse, semi-modular substructures inside dense feed-forward networks.<n>This paper introducesMoE (MLP-Experts), a training-free transformation that restructures the dense in transformer blocks into a static, high-cardinality mixture of experts.
arXiv Detail & Related papers (2025-11-26T06:14:26Z)
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression [57.54335545892155]
We introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook.<n>Our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines.
arXiv Detail & Related papers (2025-10-23T20:19:48Z)
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning [5.595343998068235]
CoSpaDi is a training-free compression framework that replaces low-rank decomposition with a more flexible structured sparse factorization.<n>We evaluate CoSpaDi across multiple Llama and Qwen models under per-layer and per-group settings at 20-50% compression ratios.
arXiv Detail & Related papers (2025-09-26T08:55:09Z)
Lookup multivariate Kolmogorov-Arnold Networks [5.639419519849473]
High-dimensional linear mappings dominate both the parameter count and the computational cost of most modern deep-learning models.<n>We introduce a general-purpose drop-in replacement, lookup multivariate Kolmogorov-Arnold Networks (lmKANs)<n>lmKANs deliver a substantially better trade-off between capacity and inference cost.
arXiv Detail & Related papers (2025-09-08T18:00:35Z)
HAC++: Towards 100X Compression of 3D Gaussian Splatting [55.6351304553003]
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.<n>However, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression.<n>We propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling.
arXiv Detail & Related papers (2025-01-21T16:23:05Z)
SWSC: Shared Weight for Similar Channel in LLM [6.795209523806925]
Large language models (LLMs) have spurred development in multiple industries.<n>We propose SWSC, an LLM compression method based on the concept of Shared Weight for Similar Channel.
arXiv Detail & Related papers (2025-01-15T07:36:19Z)
Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z)
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression [87.5604418100301]
Key-value( KV) caching is an important technique to accelerate the inference of large language models. Existing methods often compromise precision or require extra data for calibration. We introduce textbfDecoQuant, a novel data-free low-bit quantization technique based on tensor decomposition methods.
arXiv Detail & Related papers (2024-05-21T08:35:10Z)
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression [55.6351304553003]
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis. We propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over $75times$ compared to vanilla 3DGS.
arXiv Detail & Related papers (2024-03-21T16:28:58Z)
Factorizers for Distributed Sparse Block Codes [45.29870215671697]
We propose a fast and highly accurate method for factorizing distributed block codes (SBCs) Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an $ell_infty$-based similarity metric. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets.
arXiv Detail & Related papers (2023-03-24T12:31:48Z)
Kernel Quantization for Efficient Network Compression [59.55192551370948]
Kernel Quantization (KQ) aims to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer.
arXiv Detail & Related papers (2020-03-11T08:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.