Knowledge-Infused Self Attention Transformers
- URL: http://arxiv.org/abs/2306.13501v1
- Date: Fri, 23 Jun 2023 13:55:01 GMT
- Title: Knowledge-Infused Self Attention Transformers
- Authors: Kaushik Roy, Yuxin Zi, Vignesh Narayanan, Manas Gaur, Amit Sheth
- Abstract summary: Transformer-based language models have achieved impressive success in various natural language processing tasks.
This paper introduces a systematic method for infusing knowledge into different components of a transformer-based model.
- Score: 11.008412414253662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models have achieved impressive success in various
natural language processing tasks due to their ability to capture complex
dependencies and contextual information using self-attention mechanisms.
However, they are not without limitations. These limitations include
hallucinations, where they produce incorrect outputs with high confidence, and
alignment issues, where they generate unhelpful and unsafe outputs for human
users. These limitations stem from the absence of implicit and missing context
in the data alone. To address this, researchers have explored augmenting these
models with external knowledge from knowledge graphs to provide the necessary
additional context. However, the ad-hoc nature of existing methods makes it
difficult to properly analyze the effects of knowledge infusion on the many
moving parts or components of a transformer. This paper introduces a systematic
method for infusing knowledge into different components of a transformer-based
model. A modular framework is proposed to identify specific components within
the transformer architecture, such as the self-attention mechanism, encoder
layers, or the input embedding layer, where knowledge infusion can be applied.
Additionally, extensive experiments are conducted on the General Language
Understanding Evaluation (GLUE) benchmark tasks, and the findings are reported.
This systematic approach aims to facilitate more principled approaches to
incorporating knowledge into language model architectures.
Related papers
- Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization [10.944365976254442]
Methods for knowledge editing and unlearning in large language models seek to edit or remove undesirable knowledge without compromising performance.
We find a stark difference in unlearning and edit robustness when training components localized by different methods.
arXiv Detail & Related papers (2024-10-16T18:35:02Z) - Knowledge Circuits in Pretrained Transformers [47.342682123081204]
The inner workings of how modern large language models store knowledge have long been a subject of intense interest and investigation among researchers.
In this paper, we delve into the graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge.
We evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies.
arXiv Detail & Related papers (2024-05-28T08:56:33Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Augmenting LLMs with Knowledge: A survey on hallucination prevention [0.0]
This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources.
While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules.
arXiv Detail & Related papers (2023-09-28T14:09:58Z) - UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language
Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.
With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - Data Mining in Clinical Trial Text: Transformers for Classification and
Question Answering Tasks [2.127049691404299]
This research applies advances in natural language processing to evidence synthesis based on medical texts.
The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework.
Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks.
arXiv Detail & Related papers (2020-01-30T11:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.