More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs
- URL: http://arxiv.org/abs/2405.17830v2
- Date: Wed, 02 Oct 2024 02:31:04 GMT
- Title: More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs
- Authors: Chengyuan Liu, Yangyang Kang, Shihang Wang, Lizhi Qing, Fubang Zhao, Changlong Sun, Kun Kuang, Fei Wu,
- Abstract summary: The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF)
This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI)
The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
- Score: 40.54076184225558
- License:
- Abstract: The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and domain knowledge within a single instance. The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks. Taking legal domain as an example, we carefully design three groups of training and testing tasks without lacking practicability, and construct the corresponding datasets. To better incorporate general capabilities across domain-specific scenarios, we introduce ALoRA, which utilizes a multi-head attention module upon LoRA, facilitating direct information transfer from preceding tokens to the current one. This enhancement permits the representation to dynamically switch between domain-specific knowledge and general competencies according to the attention. Extensive experiments are conducted on the proposed tasks. The results exhibit the significance of our setting, and the effectiveness of our method.
Related papers
- Role Prompting Guided Domain Adaptation with General Capability Preserve
for Large Language Models [55.51408151807268]
When tailored to specific domains, Large Language Models (LLMs) tend to experience catastrophic forgetting.
crafting a versatile model for multiple domains simultaneously often results in a decline in overall performance.
We present the RolE Prompting Guided Multi-Domain Adaptation (REGA) strategy.
arXiv Detail & Related papers (2024-03-05T08:22:41Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Domain Generalization for Domain-Linked Classes [8.738092015092207]
In the real-world, classes may often be domain-linked, i.e. expressed only in a specific domain.
We propose a Fair and cONtrastive feature-space regularization algorithm for Domain-linked DG, FOND.
arXiv Detail & Related papers (2023-06-01T16:39:50Z) - Domain generalization Person Re-identification on Attention-aware
multi-operation strategery [8.90472129039969]
Domain generalization person re-identification (DG Re-ID) aims to directly deploy a model trained on the source domain to the unseen target domain with good generalization.
In the existing DG Re-ID methods, invariant operations are effective in extracting domain generalization features.
An Attention-aware Multi-operation Strategery (AMS) for DG Re-ID is proposed to extract more generalized features.
arXiv Detail & Related papers (2022-10-19T09:18:46Z) - Compound Domain Generalization via Meta-Knowledge Encoding [55.22920476224671]
We introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions.
We harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space.
Experiments on four standard Domain Generalization benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
arXiv Detail & Related papers (2022-03-24T11:54:59Z) - Unsupervised Domain Generalization for Person Re-identification: A
Domain-specific Adaptive Framework [50.88463458896428]
Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently.
Existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks.
We propose a simple and efficient domain-specific adaptive framework, and realize it with an adaptive normalization module.
arXiv Detail & Related papers (2021-11-30T02:35:51Z) - Exploiting Domain-Specific Features to Enhance Domain Generalization [10.774902700296249]
Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains.
Prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains.
We propose meta-Domain Specific-Domain Invariant (mD) - a novel theoretically sound framework.
arXiv Detail & Related papers (2021-10-18T15:42:39Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z) - Generalized Domain Conditioned Adaptation Network [33.13337928537281]
Domain Adaptation (DA) attempts to transfer knowledge learned in labeled source domain to the unlabeled but related target domain.
Recent advances in DA mainly proceed by aligning the source and target distributions.
We develop Generalized Domain Conditioned Adaptation Network (GDCAN) to automatically determine whether domain channel activations should be separately modeled in each attention module.
arXiv Detail & Related papers (2021-03-23T06:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.