G-MAP: General Memory-Augmented Pre-trained Language Model for Domain
Tasks
- URL: http://arxiv.org/abs/2212.03613v3
- Date: Sat, 17 Feb 2024 19:31:47 GMT
- Title: G-MAP: General Memory-Augmented Pre-trained Language Model for Domain
Tasks
- Authors: Zhongwei Wan, Yichun Yin, Wei Zhang, Jiaxin Shi, Lifeng Shang,
Guangyong Chen, Xin Jiang, Qun Liu
- Abstract summary: We propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP)
G-MAP augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge.
We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks.
- Score: 68.87524746922263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, domain-specific PLMs have been proposed to boost the task
performance of specific domains (e.g., biomedical and computer science) by
continuing to pre-train general PLMs with domain-specific corpora. However,
this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to
forget the previous general knowledge acquired by general PLMs, which leads to
a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate
this problem, we propose a new framework of General Memory Augmented
Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a
memory representation built from the frozen general PLM without losing any
general knowledge. Specifically, we propose a new memory-augmented layer, and
based on it, different augmented strategies are explored to build the memory
representation and then adaptively fuse it into the domain-specific PLM. We
demonstrate the effectiveness of G-MAP on various domains (biomedical and
computer science publications, news, and reviews) and different kinds (text
classification, QA, NER) of tasks, and the extensive results show that the
proposed G-MAP can achieve SOTA results on all tasks.
Related papers
- Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler [45.71475375161575]
In Open-Set Domain Generalization, the model is exposed to both new variations of data appearance (domains) and open-set conditions.
We propose the Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) to achieve an adaptive domain scheduler.
arXiv Detail & Related papers (2024-09-26T05:57:35Z) - More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs [40.54076184225558]
The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF)
This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI)
The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
arXiv Detail & Related papers (2024-05-28T05:00:12Z) - Memory-Efficient Prompt Tuning for Incremental Histopathology
Classification [69.46798702300042]
We present a memory-efficient prompt tuning framework to cultivate model generalization potential in economical memory cost.
We have extensively evaluated our framework with two histopathology tasks, i.e., breast cancer metastasis classification and epithelium-stroma tissue classification.
arXiv Detail & Related papers (2024-01-22T03:24:45Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - DoGE: Domain Reweighting with Generalization Estimation [42.32000165235568]
We propose DOmain reweighting with Generalization Estimation (DoGE)
In our experiments, we extensively show how DoGE improves the generalization of the base model to any target data mixture.
DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.
arXiv Detail & Related papers (2023-10-23T22:51:58Z) - On the Domain Adaptation and Generalization of Pretrained Language
Models: A Survey [15.533482481757353]
We propose a taxonomy of domain adaptation approaches from a machine learning system view.
We discuss and compare those methods and suggest promising future research directions.
arXiv Detail & Related papers (2022-11-06T15:32:00Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z) - META: Mimicking Embedding via oThers' Aggregation for Generalizable
Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time.
This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z) - Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain
Generalization [25.367775241988618]
Domain generalization is a difficult transfer learning problem aiming to learn a generalizable model to unseen domains.
Recent massive pre-trained models such as CLIP and GPT-3 have been shown to be robust to many distribution shifts.
We propose AP (Amortized Prompt) as a novel approach for domain inference in the form of prompt generation.
arXiv Detail & Related papers (2021-11-25T00:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.