CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes
- URL: http://arxiv.org/abs/2404.08707v7
- Date: Mon, 16 Dec 2024 11:21:30 GMT
- Title: CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes
- Authors: Haokun Zhao, Haixia Han, Jie Shi, Chengyu Du, Jiaqing Liang, Yanghua Xiao,
- Abstract summary: Continual learning is essential for keeping Large Language Models current and addressing their shortcomings.
We propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data.
Experiments show that CEM substantially enhances multiple models' performance on both in-domain and out-of-domain QA tasks, achieving gains of up to 29.63%.
- Score: 36.14056870453356
- License:
- Abstract: As world knowledge advances and new task schemas emerge, Continual Learning (CL) becomes essential for keeping Large Language Models (LLMs) current and addressing their shortcomings. This process typically involves continual instruction tuning (CIT) and continual pre-training (CPT) to enable these models to adapt to novel tasks and acquire critical knowledge. However, collecting sufficient CPT data and efficiently bridging knowledge gaps remain significant challenges. Inspired by the 'summarizing mistakes' strategy, we propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data and continually improve LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To further optimize data usage and mitigate forgetting, we introduce a novel training paradigm that combines CIT and CPT. Experiments show that CEM substantially enhances multiple models' performance on both in-domain and out-of-domain QA tasks, achieving gains of up to 29.63%. Code and datasets are available on https://anonymous.4open.science/r/cem-BB25.
Related papers
- Efficient Knowledge Feeding to Language Models: A Novel Integrated Encoder-Decoder Architecture [0.0]
ICV recasts in-context learning by using latent embeddings of language models.
ICV directly integrates information into the model, enabling it to process this information more effectively.
arXiv Detail & Related papers (2025-02-07T04:24:07Z) - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Learning to Learn without Forgetting using Attention [5.6739565497512405]
Continual learning (CL) refers to the ability to continually learn over time by accommodating new knowledge while retaining previously learned experience.
Current machine learning methods are highly prone to overwrite previously learned patterns and thus forget past experience.
Since hand-crafting effective update mechanisms is difficult, we propose meta-learning a transformer-based to enhance CL.
arXiv Detail & Related papers (2024-08-06T14:25:23Z) - Learning to Unlearn for Robust Machine Unlearning [6.488418950340473]
We introduce a novel Learning-to-Unlearn (LTU) framework to optimize the unlearning process.
LTU includes a meta-optimization scheme that facilitates models to effectively preserve generalizable knowledge.
We also introduce a Gradient Harmonization strategy to align the optimization trajectories for remembering and forgetting.
arXiv Detail & Related papers (2024-07-15T07:36:00Z) - DELTA: Decoupling Long-Tailed Online Continual Learning [7.507868991415516]
Long-Tailed Online Continual Learning (LTOCL) aims to learn new tasks from sequentially arriving class-imbalanced data streams.
We present DELTA, a decoupled learning approach designed to enhance learning representations.
We demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods.
arXiv Detail & Related papers (2024-04-06T02:33:04Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.