Continual Learning for Text Classification with Information
Disentanglement Based Regularization
- URL: http://arxiv.org/abs/2104.05489v1
- Date: Mon, 12 Apr 2021 14:17:43 GMT
- Title: Continual Learning for Text Classification with Information
Disentanglement Based Regularization
- Authors: Yufan Huang, Yanzhe Zhang, Jiaao Chen, Xuezhi Wang and Diyi Yang
- Abstract summary: We propose an information disentanglement based regularization method for continual learning on text classification.
Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method.
- Score: 18.258948837964724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning has become increasingly important as it enables NLP models
to constantly learn and gain knowledge over time. Previous continual learning
methods are mainly designed to preserve knowledge from previous tasks, without
much emphasis on how to well generalize models to new tasks. In this work, we
propose an information disentanglement based regularization method for
continual learning on text classification. Our proposed method first
disentangles text hidden spaces into representations that are generic to all
tasks and representations specific to each individual task, and further
regularizes these representations differently to better constrain the knowledge
required to generalize. We also introduce two simple auxiliary tasks: next
sentence prediction and task-id prediction, for learning better generic and
specific representation spaces. Experiments conducted on large-scale benchmarks
demonstrate the effectiveness of our method in continual text classification
tasks with various sequences and lengths over state-of-the-art baselines. We
have publicly released our code at https://github.com/GT-SALT/IDBR.
Related papers
- Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge.
The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes.
We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.
This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Adaptive Multi-Modality Prompt Learning [21.86784369327551]
We propose an adaptive multi-modality prompt learning to address the above issues.
The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts.
Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks.
arXiv Detail & Related papers (2023-11-30T12:10:22Z) - Learning Symbolic Rules over Abstract Meaning Representations for
Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies.
Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Unveiling the Tapestry: the Interplay of Generalization and Forgetting in Continual Learning [18.61040106667249]
In AI, generalization refers to a model's ability to perform well on out-of-distribution data related to a given task, beyond the data it was trained on.
Continual learning methods often include mechanisms to mitigate catastrophic forgetting, ensuring that knowledge from earlier tasks is retained.
We introduce a simple and effective technique known as Shape-Texture Consistency Regularization (STCR), which caters to continual learning.
arXiv Detail & Related papers (2022-11-21T04:36:24Z) - Learning Downstream Task by Selectively Capturing Complementary
Knowledge from Multiple Self-supervisedly Learning Pretexts [20.764378638979704]
We propose a novel solution by leveraging the attention mechanism to adaptively squeeze suitable representations for the tasks.
Our scheme significantly exceeds current popular pretext-matching based methods in gathering knowledge.
arXiv Detail & Related papers (2022-04-11T16:46:50Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.