Related papers: DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection

DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection

URL: http://arxiv.org/abs/2203.09598v1
Date: Thu, 17 Mar 2022 20:23:52 GMT
Title: DP-KB: Data Programming with Knowledge Bases Improves Transformer Fine Tuning for Answer Sentence Selection
Authors: Nic Jedema, Thuy Vu, Manish Gupta, and Alessandro Moschitti
Abstract summary: transformers demonstrate impressive performance on many knowledge intensive (KI) tasks. However, their ability to serve as implicit knowledge bases (KBs) remains limited. We implement an efficient, data-programming technique that enriches training data with KB-derived context.
Score: 96.84143731242119
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While transformers demonstrate impressive performance on many knowledge intensive (KI) tasks, their ability to serve as implicit knowledge bases (KBs) remains limited, as shown on several slot-filling, question-answering (QA), fact verification, and entity-linking tasks. In this paper, we implement an efficient, data-programming technique that enriches training data with KB-derived context and improves transformer utilization of encoded knowledge when fine-tuning for a particular QA task, namely answer sentence selection (AS2). Our method outperforms state of the art transformer approach on WikiQA and TrecQA, two widely studied AS2 benchmarks, increasing by 2.0% p@1, 1.3% MAP, 1.1% MRR, and 4.4% p@1, 0.9% MAP, 2.4% MRR, respectively. To demonstrate our improvements in an industry setting, we additionally evaluate our approach on a proprietary dataset of Alexa QA pairs, and show increase of 2.3% F1 and 2.0% MAP. We additionally find that these improvements remain even when KB context is omitted at inference time, allowing for the use of our models within existing transformer workflows without additional latency or deployment costs.

Related papers

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression [55.323397702682506]
Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining. We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery.
arXiv Detail & Related papers (2025-04-10T02:19:03Z)
RA-DIT: Retrieval-Augmented Dual Instruction Tuning [90.98423540361946]
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option.
arXiv Detail & Related papers (2023-10-02T17:16:26Z)
PAT: Position-Aware Transformer for Dense Multi-Label Action Detection [36.39340228621982]
We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video. We embed relative positional encoding in the self-attention mechanism and exploit multi-scale temporal relationships. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets.
arXiv Detail & Related papers (2023-08-09T16:29:31Z)
Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information. We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z)
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation [89.0074567748505]
We propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) Our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios.
arXiv Detail & Related papers (2022-08-22T09:14:14Z)
Improved and Efficient Conversational Slot Labeling through Question Answering [48.670822631047635]
Transformer-based pretrained language models (PLMs) offer unmatched performance across the majority of natural language understanding (NLU) tasks. We focus on modeling and studying textitslot labeling (SL), a crucial component of NLU for dialog, through the QA optics. We demonstrate how QA-tuned PLMs can be applied to the SL task, reaching new state-of-the-art performance.
arXiv Detail & Related papers (2022-04-05T11:34:35Z)
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z)
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection [14.678153928301493]
Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks.
arXiv Detail & Related papers (2021-12-13T18:45:42Z)
DoT: An efficient Double Transformer for NLP tasks with tables [3.0079490585515343]
DoT is a double transformer model that decomposes the problem into two sub-tasks. We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%.
arXiv Detail & Related papers (2021-06-01T13:33:53Z)
On the Generalization Effects of Linear Transformations in Data Augmentation [32.01435459892255]
Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. We study a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. We propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
arXiv Detail & Related papers (2020-05-02T04:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.