Knowledge Infused Decoding
- URL: http://arxiv.org/abs/2204.03084v1
- Date: Wed, 6 Apr 2022 20:58:32 GMT
- Title: Knowledge Infused Decoding
- Authors: Ruibo Liu, Guoqing Zheng, Shashank Gupta, Radhika Gaonkar, Chongyang
Gao, Soroush Vosoughi, Milad Shokouhi, Ahmed Hassan Awadallah
- Abstract summary: Knowledge Infused Decoding (KID) is a novel decoding algorithm for generative language models (LMs)
KID dynamically infuses external knowledge into each step of the LM decoding.
Human evaluation confirms KID's ability to generate more relevant and factual language for the input context.
- Score: 46.09844215234235
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained language models (LMs) have been shown to memorize a substantial
amount of knowledge from the pre-training corpora; however, they are still
limited in recalling factually correct knowledge given a certain context.
Hence, they tend to suffer from counterfactual or hallucinatory generation when
used in knowledge-intensive natural language generation (NLG) tasks. Recent
remedies to this problem focus on modifying either the pre-training or task
fine-tuning objectives to incorporate knowledge, which normally require
additional costly training or architecture modification of LMs for practical
applications. We present Knowledge Infused Decoding (KID) -- a novel decoding
algorithm for generative LMs, which dynamically infuses external knowledge into
each step of the LM decoding. Specifically, we maintain a local knowledge
memory based on the current context, interacting with a dynamically created
external knowledge trie, and continuously update the local memory as a
knowledge-aware constraint to guide decoding via reinforcement learning. On six
diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART)
armed with KID outperform many task-optimized state-of-the-art models, and show
particularly strong performance in few-shot scenarios over seven related
knowledge-infusion techniques. Human evaluation confirms KID's ability to
generate more relevant and factual language for the input context when compared
with multiple baselines. Finally, KID also alleviates exposure bias and
provides stable generation quality when generating longer sequences. Code for
KID is available at https://github.com/microsoft/KID.
Related papers
- Resolving Editing-Unlearning Conflicts: A Knowledge Codebook Framework for Large Language Model Updating [61.70705744491162]
Large Language Models (LLMs) excel in natural language processing by encoding extensive human knowledge.
Updating LLMs involves two key tasks simultaneously: unlearning to remove unwanted knowledge and editing to incorporate new information.
We propose LOKA, a conflict-free framework for LLM updating based on a knowledge codebook.
arXiv Detail & Related papers (2025-01-31T20:48:46Z) - KaLM: Knowledge-aligned Autoregressive Language Modeling via Dual-view Knowledge Graph Contrastive Learning [74.21524111840652]
This paper proposes textbfKaLM, a textitKnowledge-aligned Language Modeling approach.
It fine-tunes autoregressive large language models to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment.
Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks.
arXiv Detail & Related papers (2024-12-06T11:08:24Z) - KIF: Knowledge Identification and Fusion for Language Model Continual Learning [41.28933724210434]
We introduce a novel framework for language models, named Knowledge Identification and Fusion (KIF)
KIF segregates the model into'skill units' based on parameter dependencies, allowing for more precise control.
It employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task.
As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks.
arXiv Detail & Related papers (2024-08-09T17:44:45Z) - TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models [31.209774088374374]
This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models.
We employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information.
We show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
arXiv Detail & Related papers (2024-03-17T13:04:35Z) - InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules.
Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge.
A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z) - UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language
Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.
With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - TegTok: Augmenting Text Generation via Task-specific and Open-world
Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.