DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine
Tuning for Answer Sentence Selection
- URL: http://arxiv.org/abs/2203.09598v1
- Date: Thu, 17 Mar 2022 20:23:52 GMT
- Title: DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine
Tuning for Answer Sentence Selection
- Authors: Nic Jedema, Thuy Vu, Manish Gupta, and Alessandro Moschitti
- Abstract summary: transformers demonstrate impressive performance on many knowledge intensive (KI) tasks.
However, their ability to serve as implicit knowledge bases (KBs) remains limited.
We implement an efficient, data-programming technique that enriches training data with KB-derived context.
- Score: 96.84143731242119
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While transformers demonstrate impressive performance on many knowledge
intensive (KI) tasks, their ability to serve as implicit knowledge bases (KBs)
remains limited, as shown on several slot-filling, question-answering (QA),
fact verification, and entity-linking tasks. In this paper, we implement an
efficient, data-programming technique that enriches training data with
KB-derived context and improves transformer utilization of encoded knowledge
when fine-tuning for a particular QA task, namely answer sentence selection
(AS2). Our method outperforms state of the art transformer approach on WikiQA
and TrecQA, two widely studied AS2 benchmarks, increasing by 2.0% p@1, 1.3%
MAP, 1.1% MRR, and 4.4% p@1, 0.9% MAP, 2.4% MRR, respectively. To demonstrate
our improvements in an industry setting, we additionally evaluate our approach
on a proprietary dataset of Alexa QA pairs, and show increase of 2.3% F1 and
2.0% MAP. We additionally find that these improvements remain even when KB
context is omitted at inference time, allowing for the use of our models within
existing transformer workflows without additional latency or deployment costs.
Related papers
- RA-DIT: Retrieval-Augmented Dual Instruction Tuning [90.98423540361946]
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores.
Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance.
We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option.
arXiv Detail & Related papers (2023-10-02T17:16:26Z) - PAT: Position-Aware Transformer for Dense Multi-Label Action Detection [36.39340228621982]
We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video.
We embed relative positional encoding in the self-attention mechanism and exploit multi-scale temporal relationships.
We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets.
arXiv Detail & Related papers (2023-08-09T16:29:31Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation [89.0074567748505]
We propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA)
Our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios.
arXiv Detail & Related papers (2022-08-22T09:14:14Z) - Improved and Efficient Conversational Slot Labeling through Question
Answering [48.670822631047635]
Transformer-based pretrained language models (PLMs) offer unmatched performance across the majority of natural language understanding (NLU) tasks.
We focus on modeling and studying textitslot labeling (SL), a crucial component of NLU for dialog, through the QA optics.
We demonstrate how QA-tuned PLMs can be applied to the SL task, reaching new state-of-the-art performance.
arXiv Detail & Related papers (2022-04-05T11:34:35Z) - ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through
Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer.
This paper proposes a well-designed model named ERNIE-Sparse.
It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z) - Improving and Diagnosing Knowledge-Based Visual Question Answering via
Entity Enhanced Knowledge Injection [14.678153928301493]
Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image.
Recent single text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks.
arXiv Detail & Related papers (2021-12-13T18:45:42Z) - DoT: An efficient Double Transformer for NLP tasks with tables [3.0079490585515343]
DoT is a double transformer model that decomposes the problem into two sub-tasks.
We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%.
arXiv Detail & Related papers (2021-06-01T13:33:53Z) - On the Generalization Effects of Linear Transformations in Data
Augmentation [32.01435459892255]
Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks.
We study a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting.
We propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
arXiv Detail & Related papers (2020-05-02T04:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.