Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding
- URL: http://arxiv.org/abs/2201.01420v1
- Date: Wed, 5 Jan 2022 02:46:16 GMT
- Title: Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding
- Authors: Ting Hua, Yilin Shen, Changsheng Zhao, Yen-Chang Hsu, Hongxia Jin
- Abstract summary: Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
- Score: 60.226644697970116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain classification is the fundamental task in natural language
understanding (NLU), which often requires fast accommodation to new emerging
domains. This constraint makes it impossible to retrain all previous domains,
even if they are accessible to the new model. Most existing continual learning
approaches suffer from low accuracy and performance fluctuation, especially
when the distributions of old and new data are significantly different. In
fact, the key real-world problem is not the absence of old data, but the
inefficiency to retrain the model with the whole old dataset. Is it potential
to utilize some old data to yield high accuracy and maintain stable
performance, while at the same time, without introducing extra hyperparameters?
In this paper, we proposed a hyperparameter-free continual learning model for
text data that can stably produce high performance under various environments.
Specifically, we utilize Fisher information to select exemplars that can
"record" key information of the original model. Also, a novel scheme called
dynamical weight consolidation is proposed to enable hyperparameter-free
learning during the retrain process. Extensive experiments demonstrate that
baselines suffer from fluctuated performance and therefore useless in practice.
On the contrary, our proposed model CCFI significantly and consistently
outperforms the best state-of-the-art method by up to 20% in average accuracy,
and each component of CCFI contributes effectively to overall performance.
Related papers
- A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation [0.562479170374811]
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift.
Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data.
We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness.
arXiv Detail & Related papers (2024-11-23T17:35:23Z) - Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA [15.542668474378633]
We propose a novel and efficient machine unlearning method on pre-trained models.
We leverage LoRA to decompose the model's intermediate features into pre-trained features and residual features.
The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set.
arXiv Detail & Related papers (2024-11-13T08:56:35Z) - FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning [5.23984567704876]
Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning.
Traditional approach fails to address the phenomenon of class-wise bias in global long-tailed data.
New method FedLF introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation.
arXiv Detail & Related papers (2024-09-18T16:25:29Z) - Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - TAIA: Large Language Models are Out-of-Distribution Data Learners [30.57872423927015]
We propose an effective inference-time intervention method: Training All parameters but Inferring with only Attention (trainallInfAttn)
trainallInfAttn achieves superior improvements compared to both the fully fine-tuned model and the base model in most scenarios.
The high tolerance of trainallInfAttn to data mismatches makes it resistant to jailbreaking tuning and enhances specialized tasks using general data.
arXiv Detail & Related papers (2024-05-30T15:57:19Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.