Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification
- URL: http://arxiv.org/abs/2408.08803v1
- Date: Fri, 16 Aug 2024 15:28:02 GMT
- Title: Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification
- Authors: Abdullah Al Imran, Md Farhan Ishmam,
- Abstract summary: We introduce FR-KAN, a variant of the promising alternative called Kolmogorov-Arnold Networks (KANs) as classification heads for transformer-based encoders.
Our studies reveal an average increase of 10% in accuracy and 11% in F1-score when incorporating traditional heads instead of transformer-based pre-trained models.
- Score: 0.51795041186793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For many years, transformer-based pre-trained models with Multi-layer Perceptron (MLP) heads have been the standard for text classification tasks. However, the fixed non-linear functions employed by MLPs often fall short of capturing the intricacies of the contextualized embeddings produced by pre-trained encoders. Furthermore, MLPs usually require a significant number of training parameters, which can be computationally expensive. In this work, we introduce FourierKAN (FR-KAN), a variant of the promising MLP alternative called Kolmogorov-Arnold Networks (KANs), as classification heads for transformer-based encoders. Our studies reveal an average increase of 10% in accuracy and 11% in F1-score when incorporating FR-KAN heads instead of traditional MLP heads for several transformer-based pre-trained models across multiple text classification tasks. Beyond improving model accuracy, FR-KAN heads train faster and require fewer parameters. Our research opens new grounds for broader applications of KAN across several Natural Language Processing (NLP) tasks.
Related papers
- Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior [9.878774148693575]
We introduce a new magnitude-based pruning algorithm called mixture Gaussian prior pruning.
It aims to retain the model's expressive capability.
We provide a theoretical justification for the consistency of the sparse transformer.
arXiv Detail & Related papers (2024-11-01T18:39:38Z) - Pre-trained Large Language Models Use Fourier Features to Compute Addition [37.56242478466735]
Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities.
How they compute basic arithmetic, such as addition, remains unclear.
arXiv Detail & Related papers (2024-06-05T16:40:53Z) - Improved Implicit Neural Representation with Fourier Reparameterized Training [21.93903328906775]
Implicit Neural Representation (INR) as a mighty representation paradigm has achieved success in various computer vision tasks recently.
Existing methods have investigated advanced techniques, such as positional encoding and periodic activation function, to improve the accuracy of INR.
arXiv Detail & Related papers (2024-01-15T00:40:41Z) - Parameter and Computation Efficient Transfer Learning for
Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models.
We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z) - NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning [40.994306592119266]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications.
Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning.
We propose to coin a lightweight PLM through NTK-approximating modules in fusion.
arXiv Detail & Related papers (2023-07-18T03:12:51Z) - Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class.
We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features.
Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.