EKAN: Equivariant Kolmogorov-Arnold Networks
- URL: http://arxiv.org/abs/2410.00435v1
- Date: Tue, 1 Oct 2024 06:34:58 GMT
- Title: EKAN: Equivariant Kolmogorov-Arnold Networks
- Authors: Lexiang Hu, Yisen Wang, Zhouchen Lin,
- Abstract summary: Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains.
However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning.
We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
- Score: 69.30866522377694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains thanks to spline activation functions, becoming an alternative to Multi-Layer Perceptrons (MLPs). However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning. Previously, equivariant networks embed symmetry into their architectures, achieving better performance in specific applications. Among these, Equivariant Multi-Layer Perceptrons (EMLP) introduce arbitrary matrix group equivariance into MLPs, providing a general framework for constructing equivariant networks layer by layer. In this paper, we propose Equivariant Kolmogorov-Arnold Networks (EKAN), a method for incorporating matrix group equivariance into KANs, aiming to broaden their applicability to more fields. First, we construct gated spline basis functions, which form the EKAN layer together with equivariant linear weights. We then define a lift layer to align the input space of EKAN with the feature space of the dataset, thereby building the entire EKAN architecture. Compared with baseline models, EKAN achieves higher accuracy with smaller datasets or fewer parameters on symmetry-related tasks, such as particle scattering and the three-body problem, often reducing test MSE by several orders of magnitude. Even in non-symbolic formula scenarios, such as top quark tagging with three jet constituents, EKAN achieves comparable results with EMLP using only $26\%$ of the parameters, while KANs do not outperform MLPs as expected.
Related papers
- Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations [3.0222726571099665]
We show that non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.
We also show empirically that these additional non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.
arXiv Detail & Related papers (2024-10-09T08:19:31Z) - Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance [16.49488981364657]
We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries.
We use an arbitrary base model such as anvariant or a transformer and symmetrize it to be equivariant to the given group.
Empirical tests show competitive results against tailored equivariant architectures.
arXiv Detail & Related papers (2023-06-05T13:40:54Z) - EDGI: Equivariant Diffusion for Planning with Embodied Agents [17.931089055248062]
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries.
We introduce the Equivariant diffuser for Generating Interactions (EDGI), an algorithm for planning and model-based reinforcement learning.
EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
arXiv Detail & Related papers (2023-03-22T09:19:39Z) - Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces.
It takes as input a concatenation of weights and biases of a pre-trainedvariant.
We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z) - Architectural Optimization over Subgroups for Equivariant Neural
Networks [0.0]
We propose equivariance relaxation morphism and $[G]$-mixed equivariant layer to operate with equivariance constraints on a subgroup.
We present evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mechanisms respectively for equivariance-aware architectural optimization.
arXiv Detail & Related papers (2022-10-11T14:37:29Z) - Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained
Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics.
With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture.
Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z) - Equivariant Graph Hierarchy-Based Neural Networks [53.60804845045526]
We propose Equivariant Hierarchy-based Graph Networks (EGHNs)
EGHNs consist of the three key components: generalized Equivariant Matrix Message Passing (EMMP), E-Pool and E-UpPool.
Considerable experimental evaluations verify the effectiveness of our EGHN on several applications including multi-object dynamics simulation, motion capture, and protein dynamics modeling.
arXiv Detail & Related papers (2022-02-22T03:11:47Z) - Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types.
We show that FA-based models have maximal expressive power in a broad setting.
We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.