Incorporating Arbitrary Matrix Group Equivariance into KANs
- URL: http://arxiv.org/abs/2410.00435v4
- Date: Fri, 15 Aug 2025 15:17:11 GMT
- Title: Incorporating Arbitrary Matrix Group Equivariance into KANs
- Authors: Lexiang Hu, Yisen Wang, Zhouchen Lin,
- Abstract summary: Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains.<n>We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
- Score: 69.30866522377694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains thanks to spline activation functions, becoming an alternative to Multi-Layer Perceptrons (MLPs). However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning. In this paper, we propose Equivariant Kolmogorov-Arnold Networks (EKAN), a method for incorporating arbitrary matrix group equivariance into KANs, aiming to broaden their applicability to more fields. We first construct gated spline basis functions, which form the EKAN layer together with equivariant linear weights, and then define a lift layer to align the input space of EKAN with the feature space of the dataset, thereby building the entire EKAN architecture. Compared with baseline models, EKAN achieves higher accuracy with smaller datasets or fewer parameters on symmetry-related tasks, such as particle scattering and the three-body problem, often reducing test MSE by several orders of magnitude. Even in non-symbolic formula scenarios, such as top quark tagging with three jet constituents, EKAN achieves comparable results with state-of-the-art equivariant architectures using fewer than 40% of the parameters, while KANs do not outperform MLPs as expected. Code and data are available at https://github.com/hulx2002/EKAN .
Related papers
- GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions [0.0]
We propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem.<n> GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer.<n>Our results demonstrate that GS-KAN outperforms both approximations and standard KAN baselines on continuous function tasks while maintaining superior parameter efficiency.
arXiv Detail & Related papers (2025-12-09T19:56:36Z) - Scalable and Interpretable Scientific Discovery via Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) [0.0]
Kolmogorov-Arnold Networks (KANs) offer a promising alternative to Multi-Layer Perceptron (MLP)<n>KANs lack probabilistic outputs, limiting their utility in applications requiring uncertainty quantification.<n>We introduce the Sparse Variational GP-KAN, an architecture that integrates sparse variational inference with the KAN topology.
arXiv Detail & Related papers (2025-11-29T00:48:55Z) - FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing [27.415937333981905]
Permutation Function Sharing KAN (FS-KAN) is a principled approach to constructing equivariant and invariant KA layers for arbitrary permutation symmetry groups.<n>We show that FS-KANs exhibit superior data efficiency compared to standard parameter-sharing layers, by a wide margin in certain cases.
arXiv Detail & Related papers (2025-09-29T08:49:09Z) - Permutation-Invariant Transformer Neural Architectures for Set-Based Indoor Localization Using Learned RSSI Embeddings [0.0]
We propose a permutation-in neural architecture for indoor localization using RSSI scans from Wi-Fi access points.<n>We evaluate the model on a dataset collected across a campus environment consisting of six buildings.<n>Results show that the model accurately recovers fine-grained structure and maintains performance across physically distinct domains.
arXiv Detail & Related papers (2025-05-31T17:56:39Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations [3.0222726571099665]
We show that non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.
We also show empirically that these additional non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.
arXiv Detail & Related papers (2024-10-09T08:19:31Z) - Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks [5.187307904567701]
Group Matrices (GMs) are forgotten precursor to modern notion of regular representations of finite groups.
We show GMs can generalize classical LDR theory to general discrete groups.
Our framework performs competitively with approximately equivariant NNs and other structured matrix-based methods.
arXiv Detail & Related papers (2024-09-18T07:52:33Z) - Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance [16.49488981364657]
We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries.
We use an arbitrary base model such as anvariant or a transformer and symmetrize it to be equivariant to the given group.
Empirical tests show competitive results against tailored equivariant architectures.
arXiv Detail & Related papers (2023-06-05T13:40:54Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - EDGI: Equivariant Diffusion for Planning with Embodied Agents [17.931089055248062]
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries.
We introduce the Equivariant diffuser for Generating Interactions (EDGI), an algorithm for planning and model-based reinforcement learning.
EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
arXiv Detail & Related papers (2023-03-22T09:19:39Z) - Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces.
It takes as input a concatenation of weights and biases of a pre-trainedvariant.
We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z) - Architectural Optimization over Subgroups for Equivariant Neural
Networks [0.0]
We propose equivariance relaxation morphism and $[G]$-mixed equivariant layer to operate with equivariance constraints on a subgroup.
We present evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mechanisms respectively for equivariance-aware architectural optimization.
arXiv Detail & Related papers (2022-10-11T14:37:29Z) - Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained
Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics.
With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture.
Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z) - Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types.
We show that FA-based models have maximal expressive power in a broad setting.
We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.