RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
- URL: http://arxiv.org/abs/2510.20479v1
- Date: Thu, 23 Oct 2025 12:17:37 GMT
- Title: RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
- Authors: Bowen Wang, Haiyuan Wan, Liwen Shi, Chen Yang, Peng He, Yue Ma, Haochen Han, Wenhao Li, Tiao Tan, Yongjian Li, Fangming Liu, Yifan Gong, Sheng Zhang,
- Abstract summary: Internal representations in large language models (LLMs) serve as reliable proxies of learned knowledge.<n>We propose RECALL, a representation-aware model merging framework for continual learning without access to historical data.
- Score: 33.22889542330089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We unveil that internal representations in large language models (LLMs) serve as reliable proxies of learned knowledge, and propose RECALL, a novel representation-aware model merging framework for continual learning without access to historical data. RECALL computes inter-model similarity from layer-wise hidden representations over clustered typical samples, and performs adaptive, hierarchical parameter fusion to align knowledge across models. This design enables the preservation of domain-general features in shallow layers while allowing task-specific adaptation in deeper layers. Unlike prior methods that require task labels or incur performance trade-offs, RECALL achieves seamless multi-domain integration and strong resistance to catastrophic forgetting. Extensive experiments across five NLP tasks and multiple continual learning scenarios show that RECALL outperforms baselines in both knowledge retention and generalization, providing a scalable and data-free solution for evolving LLMs.
Related papers
- ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction [55.21514454560188]
Unified multimodal models significantly improve visual generation by combining vision-grained models (VLMs) with diffusion models.<n>Existing methods struggle to fully balance sufficient interaction and flexible implementation due to vast representation difference.<n>We propose textbfParaUni, which extracts features from variants VLM's layers in a textbfParallel way for comprehensive information interaction.
arXiv Detail & Related papers (2025-12-05T04:41:57Z) - Intrinsic Training Signals for Federated Learning Aggregation [13.540945877050525]
Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy.<n>This work demonstrates that effective model merging can be achieved solely through existing training signals.
arXiv Detail & Related papers (2025-07-09T13:03:23Z) - Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition [10.8843105310375]
Query-based Adaptive Aggregation (QAA) is a novel feature aggregation technique that leverages learned queries as reference codebooks.<n>We show that QAA outperforms state-of-the-art models, achieving balanced generalization across diverse datasets while maintaining peak performance comparable to dataset-specific models.
arXiv Detail & Related papers (2025-07-04T22:40:03Z) - Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data [16.462869377794316]
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features.<n>Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL.<n>We propose LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors.
arXiv Detail & Related papers (2025-06-10T11:34:23Z) - LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.<n>LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks [3.776249047528669]
We leverage the abundance of freely trained models to introduce a cost-free approach to model merging.
It aims to maintain the distinctiveness of the task-specific final layers while unifying the initial layers.
This approach ensures parameter consistency across all layers, essential for boosting performance.
arXiv Detail & Related papers (2024-09-24T07:19:30Z) - Diverse Representation Embedding for Lifelong Person Re-Identification [10.824003066938234]
Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras.
Existing methods based on CNN backbone are insufficient to explore the representation of each instance from different perspectives.
We propose a Diverse Representations Embedding (DRE) framework that first explores a pure transformer for LReID.
arXiv Detail & Related papers (2024-03-24T04:22:37Z) - Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy [48.29181662640212]
Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models.
We consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs.
arXiv Detail & Related papers (2024-02-20T08:41:23Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.