Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness
- URL: http://arxiv.org/abs/2507.04690v1
- Date: Mon, 07 Jul 2025 06:13:32 GMT
- Title: Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness
- Authors: Hanseon Joo, Hayoung Choi, Ook Lee, Minjong Cheon,
- Abstract summary: Modulation Joint KAN (MJKAN) is a novel neural network layer designed to overcome these challenges.<n>MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function activations.<n>We empirically validated MJKAN's performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS)
- Score: 5.474797258314827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kolmogorov-Arnold Networks (KANs) have garnered attention for replacing fixed activation functions with learnable univariate functions, but they exhibit practical limitations, including high computational costs and performance deficits in general classification tasks. In this paper, we propose the Modulation Joint KAN (MJKAN), a novel neural network layer designed to overcome these challenges. MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations, creating a hybrid architecture that combines the non-linear expressive power of KANs with the efficiency of Multilayer Perceptrons (MLPs). We empirically validated MJKAN's performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam). The results demonstrate that MJKAN achieves superior approximation capabilities in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases. Conversely, in image and text classification, its performance was competitive with MLPs but revealed a critical dependency on the number of basis functions. We found that a smaller basis size was crucial for better generalization, highlighting that the model's capacity must be carefully tuned to the complexity of the data to prevent overfitting. In conclusion, MJKAN offers a flexible architecture that inherits the theoretical advantages of KANs while improving computational efficiency and practical viability.
Related papers
- KANMixer: Can KAN Serve as a New Modeling Core for Long-term Time Series Forecasting? [17.96421618979159]
We introduce KANMixer, a concise architecture integrating a multi-scale mixing backbone that fully leverages KAN's adaptive capabilities.<n>We show that KANMixer achieves state-of-the-art performance in 16 out of 28 experiments across seven benchmark datasets.
arXiv Detail & Related papers (2025-08-03T04:03:13Z) - Scientific Machine Learning with Kolmogorov-Arnold Networks [0.0]
The field of scientific machine learning is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding.<n>This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning.<n>We highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively.
arXiv Detail & Related papers (2025-07-30T01:26:44Z) - Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions.<n>This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z) - R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z) - Enhancing Physics-Informed Neural Networks with a Hybrid Parallel Kolmogorov-Arnold and MLP Architecture [0.0]
We propose a novel architecture that integrates parallelized KAN and branches within a unified PINN framework.<n>The HPKM-PINN introduces a scaling factor xi, to optimally balance the complementary strengths of KAN's interpretable function approximation and numerical's nonlinear learning.<n>These findings highlight the HPKM-PINN's ability to leverage KAN's interpretability and robustness, positioning it as a versatile and scalable tool for solving complex PDE-driven problems.
arXiv Detail & Related papers (2025-03-30T02:59:32Z) - LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding [4.759109475818876]
Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains.<n>We introduce LIFT, a novel, high-performance framework that captures multiscale information through meta-learning.<n>We also introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings.
arXiv Detail & Related papers (2025-03-19T17:00:58Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - On the Statistical Efficiency of Mean-Field Reinforcement Learning with General Function Approximation [20.66437196305357]
We study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation.
We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes.
arXiv Detail & Related papers (2023-05-18T20:00:04Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Local Function Complexity for Active Learning via Mixture of Gaussian
Processes [5.382740428160009]
Inhomogeneities in real-world data, due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference.
In this paper, we draw on recent theoretical results on the estimation of local function complexity (LFC)
We derive and estimate the Gaussian process regression (GPR)-based analog of the LPS-based LFC and use it as a substitute in the above framework to make it robust and scalable.
arXiv Detail & Related papers (2019-02-27T17:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.