Knowledge Rumination for Pre-trained Language Models
- URL: http://arxiv.org/abs/2305.08732v3
- Date: Wed, 11 Oct 2023 10:51:12 GMT
- Title: Knowledge Rumination for Pre-trained Language Models
- Authors: Yunzhi Yao, Peng Wang, Shengyu Mao, Chuanqi Tan, Fei Huang, Huajun
Chen, Ningyu Zhang
- Abstract summary: We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
- Score: 77.55888291165462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies have revealed that vanilla pre-trained language models
(PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus,
several works have attempted to integrate external knowledge into PLMs.
However, despite the promising outcome, we empirically observe that PLMs may
have already encoded rich knowledge in their pre-trained parameters but fail to
fully utilize them when applying them to knowledge-intensive tasks. In this
paper, we propose a new paradigm dubbed Knowledge Rumination to help the
pre-trained language model utilize that related latent knowledge without
retrieving it from the external corpus. By simply adding a prompt like "As far
as I know" to the PLMs, we try to review related latent knowledge and inject
them back into the model for knowledge consolidation. We apply the proposed
knowledge rumination to various language models, including RoBERTa, DeBERTa,
and GPT-3. Experimental results on six commonsense reasoning tasks and GLUE
benchmarks demonstrate the effectiveness of our proposed approach, which proves
that the knowledge stored in PLMs can be better exploited to enhance
performance. Code is available in
https://github.com/zjunlp/knowledge-rumination.
Related papers
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models [59.771098292611846]
Large language models (LLMs) have shown superior performance without task-specific fine-tuning.
Retrieval-based methods can offer non-parametric world knowledge and improve the performance on tasks such as question answering.
Self-Knowledge guided Retrieval augmentation (SKR) is a simple yet effective method which can let LLMs refer to the questions they have previously encountered.
arXiv Detail & Related papers (2023-10-08T04:22:33Z) - Thrust: Adaptively Propels Large Language Models with External Knowledge [58.72867916604562]
Large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters.
The inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary.
We propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary.
arXiv Detail & Related papers (2023-07-19T20:16:46Z) - UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language
Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.
With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - DictBERT: Dictionary Description Knowledge Enhanced Language Model
Pre-training via Contrastive Learning [18.838291575019504]
Pre-trained language models (PLMs) are shown to be lacking in knowledge when dealing with knowledge driven tasks.
We propose textbfDictBERT, a novel approach that enhances PLMs with dictionary knowledge.
We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE.
arXiv Detail & Related papers (2022-08-01T06:43:19Z) - MLRIP: Pre-training a military language representation model with
informative factual knowledge and professional knowledge base [11.016827497014821]
Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement.
We propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy.
Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.
arXiv Detail & Related papers (2022-07-28T07:39:30Z) - Knowledgeable Salient Span Mask for Enhancing Language Models as
Knowledge Base [51.55027623439027]
We develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner.
To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.
arXiv Detail & Related papers (2022-04-17T12:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.