Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding
- URL: http://arxiv.org/abs/2201.01420v1
- Date: Wed, 5 Jan 2022 02:46:16 GMT
- Title: Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding
- Authors: Ting Hua, Yilin Shen, Changsheng Zhao, Yen-Chang Hsu, Hongxia Jin
- Abstract summary: Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
- Score: 60.226644697970116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain classification is the fundamental task in natural language
understanding (NLU), which often requires fast accommodation to new emerging
domains. This constraint makes it impossible to retrain all previous domains,
even if they are accessible to the new model. Most existing continual learning
approaches suffer from low accuracy and performance fluctuation, especially
when the distributions of old and new data are significantly different. In
fact, the key real-world problem is not the absence of old data, but the
inefficiency to retrain the model with the whole old dataset. Is it potential
to utilize some old data to yield high accuracy and maintain stable
performance, while at the same time, without introducing extra hyperparameters?
In this paper, we proposed a hyperparameter-free continual learning model for
text data that can stably produce high performance under various environments.
Specifically, we utilize Fisher information to select exemplars that can
"record" key information of the original model. Also, a novel scheme called
dynamical weight consolidation is proposed to enable hyperparameter-free
learning during the retrain process. Extensive experiments demonstrate that
baselines suffer from fluctuated performance and therefore useless in practice.
On the contrary, our proposed model CCFI significantly and consistently
outperforms the best state-of-the-art method by up to 20% in average accuracy,
and each component of CCFI contributes effectively to overall performance.
Related papers
- Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - TAIA: Large Language Models are Out-of-Distribution Data Learners [30.578724239270144]
Fine-tuning on task-specific question-answer pairs is a predominant method for enhancing the performance of instruction-tuned large language models.
We re-evaluated the Transformer architecture and discovered that not all parameter updates during fine-tuning contribute positively to downstream performance.
We propose an effective inference-time intervention method: ulineTraining ulineAll parameters but ulineInferring with only ulineAttention.
arXiv Detail & Related papers (2024-05-30T15:57:19Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - CILIATE: Towards Fairer Class-based Incremental Learning by Dataset and
Training Refinement [20.591583747291892]
We show that CIL suffers both dataset and algorithm bias problems.
We propose a novel framework, CILIATE, that fixes both dataset and algorithm bias in CIL.
CILIATE improves the fairness of CIL by 17.03%, 22.46%, and 31.79% compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-04-09T12:10:39Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.