Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning
- URL: http://arxiv.org/abs/2511.21081v1
- Date: Wed, 26 Nov 2025 05:50:34 GMT
- Title: Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning
- Authors: Thura Aung, Eaint Kay Khaing Kyaw, Ye Kyaw Thu, Thazin Myint Oo, Thepchai Supnithi,
- Abstract summary: This work explores Kolmogorov-Arnold Networks (KANs) as alternative classification heads.<n>KANs are competitive with or superior to transformers for low-resource language classification.<n>These findings highlight KANs as expressive, efficient alternatives to transformers for low-resource language classification.
- Score: 0.26097841018267615
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In low-resource languages like Burmese, classification tasks often fine-tune only the final classification layer, keeping pre-trained encoder weights frozen. While Multi-Layer Perceptrons (MLPs) are commonly used, their fixed non-linearity can limit expressiveness and increase computational cost. This work explores Kolmogorov-Arnold Networks (KANs) as alternative classification heads, evaluating Fourier-based FourierKAN, Spline-based EfficientKAN, and Grid-based FasterKAN-across diverse embeddings including TF-IDF, fastText, and multilingual transformers (mBERT, Distil-mBERT). Experimental results show that KAN-based heads are competitive with or superior to MLPs. EfficientKAN with fastText achieved the highest F1-score (0.928), while FasterKAN offered the best trade-off between speed and accuracy. On transformer embeddings, EfficientKAN matched or slightly outperformed MLPs with mBERT (0.917 F1). These findings highlight KANs as expressive, efficient alternatives to MLPs for low-resource language classification.
Related papers
- Improving Memory Efficiency for Training KANs via Meta Learning [55.24089119864207]
We propose to generate weights for KANs via a smaller meta-learner, called MetaKANs.<n>By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance.
arXiv Detail & Related papers (2025-06-09T08:38:26Z) - Enhancing Federated Learning with Kolmogorov-Arnold Networks: A Comparative Study Across Diverse Aggregation Strategies [0.24578723416255752]
Kolmogorov-Arnold Networks (KAN) have shown promising capabilities in modeling complex nonlinear relationships.<n>KANs consistently outperform Multilayer Perceptrons in terms of accuracy, stability, and convergence efficiency.
arXiv Detail & Related papers (2025-05-12T14:56:27Z) - PowerMLP: An Efficient Version of KAN [10.411788782126091]
The Kolmogorov-Arnold Network (KAN) is a new network architecture known for its high accuracy in several tasks such as function fitting and PDE solving.<n>The superior computation capability of KAN arises from the Kolmogorov-Arnold representation and learnable spline functions.<n>PowerMLP achieves higher accuracy and a training speed about 40 times faster than KAN in various tasks.
arXiv Detail & Related papers (2024-12-18T07:42:34Z) - Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification [0.51795041186793]
We introduce FR-KAN, a variant of the promising alternative called Kolmogorov-Arnold Networks (KANs) as classification heads for transformer-based encoders.
Our studies reveal an average increase of 10% in accuracy and 11% in F1-score when incorporating traditional heads instead of transformer-based pre-trained models.
arXiv Detail & Related papers (2024-08-16T15:28:02Z) - F-KANs: Federated Kolmogorov-Arnold Networks [3.8277268808551512]
We present an innovative federated learning (FL) approach that utilizes Kolmogorov-Arnold Networks (KANs) for classification tasks.
The study evaluates the performance of federated KANs compared to traditional Multi-Layer Perceptrons (MLPs) classification task.
arXiv Detail & Related papers (2024-07-29T15:28:26Z) - Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node [49.08777822540483]
Fast feedforward networks (FFFs) exploit the observation that different regions of the input space activate distinct subsets of neurons in wide networks.
We propose the incorporation of load balancing and Master Leaf techniques into the FFF architecture to improve performance and simplify the training process.
arXiv Detail & Related papers (2024-05-27T05:06:24Z) - A Simple yet Effective Self-Debiasing Framework for Transformer Models [49.09053367249642]
Current Transformer-based natural language understanding (NLU) models heavily rely on dataset biases.
We propose a simple yet effective self-debiasing framework for Transformer-based NLU models.
arXiv Detail & Related papers (2023-06-02T20:31:58Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - Nearest Neighbor Zero-Shot Inference [68.56747574377215]
kNN-Prompt is a technique to use k-nearest neighbor (kNN) retrieval augmentation for zero-shot inference with language models (LMs)
fuzzy verbalizers leverage the sparse kNN distribution for downstream tasks by automatically associating each classification label with a set of natural language tokens.
Experiments show that kNN-Prompt is effective for domain adaptation with no further training, and that the benefits of retrieval increase with the size of the model used for kNN retrieval.
arXiv Detail & Related papers (2022-05-27T07:00:59Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.