Related papers: Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning

Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning

URL: http://arxiv.org/abs/2409.07763v1
Date: Thu, 12 Sep 2024 05:36:40 GMT
Title: Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning
Authors: Sheng Shen, Rabih Younes,
Abstract summary: Kolmogorov-Arnold Networks (KAN) is an enhancement to the traditional linear probing method in transfer learning. KAN consistently outperforms traditional linear probing, achieving significant improvements in accuracy and generalization.
Score: 18.69601183838834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to the traditional linear probing method in transfer learning. Linear probing, often applied to the final layer of pre-trained models, is limited by its inability to model complex relationships in data. To address this, we propose substituting the linear probing layer with KAN, which leverages spline-based representations to approximate intricate functions. In this study, we integrate KAN with a ResNet-50 model pre-trained on ImageNet and evaluate its performance on the CIFAR-10 dataset. We perform a systematic hyperparameter search, focusing on grid size and spline degree (k), to optimize KAN's flexibility and accuracy. Our results demonstrate that KAN consistently outperforms traditional linear probing, achieving significant improvements in accuracy and generalization across a range of configurations. These findings indicate that KAN offers a more powerful and adaptable alternative to conventional linear probing techniques in transfer learning.

Related papers

Fusing CFD and measurement data using transfer learning [49.1574468325115]
We introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning.<n>In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities.<n>The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model.
arXiv Detail & Related papers (2025-07-28T07:21:46Z)
HKAN: Hierarchical Kolmogorov-Arnold Network without Backpropagation [1.3812010983144802]
The Hierarchical Kolmogorov-Arnold Network (HKAN) is a novel network architecture that offers a competitive alternative to the recently proposed Kolmogorov-Arnold Network (KAN) HKAN adopts a randomized learning approach, where the parameters of its basis functions are fixed, and linear aggregations are optimized using least-squares regression. Empirical results show that HKAN delivers comparable, if not superior, accuracy and stability relative to KAN across various regression tasks, while also providing insights into variable importance.
arXiv Detail & Related papers (2025-01-30T08:44:54Z)
Graph Neural Networks and Differential Equations: A hybrid approach for data assimilation of fluid flows [0.0]
This study presents a novel hybrid approach that combines Graph Neural Networks (GNNs) with Reynolds-Averaged Navier Stokes (RANS) equations. The results demonstrate significant improvements in the accuracy of the reconstructed mean flow compared to purely data-driven models.
arXiv Detail & Related papers (2024-11-14T14:31:52Z)
Convolutional Kolmogorov-Arnold Networks [41.94295877935867]
We present Convolutional Kolmogorov-Arnold Networks (KANs) KANs replace traditional fixed-weight kernels with learnable non-linear functions. We empirically evaluate Convolutional KANs on the Fashion-MNIST dataset, demonstrating competitive accuracy with up to 50% fewer parameters compared to baseline CNNs.
arXiv Detail & Related papers (2024-06-19T02:09:44Z)
Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers [42.69799418639716]
Deep learning models may be used to precondition residuals during iteration of such linear solvers as the conjugate gradient (CG) method. Neural network models require an enormous number of parameters to approximate well in this setup. In our work, we recall well-established preconditioners from linear algebra and use them as a starting point for training the GNN.
arXiv Detail & Related papers (2024-05-24T13:44:30Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z)
Enhancing training of physics-informed neural networks using domain-decomposition based preconditioning strategies [1.8434042562191815]
We introduce additive and multiplicative preconditioning strategies for the widely used L-BFGS. We demonstrate that both additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS.
arXiv Detail & Related papers (2023-06-30T13:35:09Z)
Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights. In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z)
Leveraging Angular Information Between Feature and Classifier for Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights. Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction. Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Exploiting Spline Models for the Training of Fully Connected Layers in Neural Network [0.0]
The fully connected (FC) layer, one of the most fundamental modules in artificial neural networks (ANN), is often considered difficult and inefficient to train. We propose a spline-based approach that eases the difficulty of training FC layers. Our approach reduces the computational cost, accelerates the convergence of FC layers, and significantly increases the interpretability of the resulting model.
arXiv Detail & Related papers (2021-02-12T14:36:55Z)
LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.