Related papers: Distilling Knowledge for Fast Retrieval-based Chat-bots

Distilling Knowledge for Fast Retrieval-based Chat-bots

URL: http://arxiv.org/abs/2004.11045v1
Date: Thu, 23 Apr 2020 09:41:37 GMT
Title: Distilling Knowledge for Fast Retrieval-based Chat-bots
Authors: Amir Vakili Tahami, Kamyar Ghajar, Azadeh Shakery
Abstract summary: We propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time.
Score: 6.284464997330884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Response retrieval is a subset of neural ranking in which a model selects a suitable response from a set of candidates given a conversation history. Retrieval-based chat-bots are typically employed in information seeking conversational systems such as customer support agents. In order to make pairwise comparisons between a conversation history and a candidate response, two approaches are common: cross-encoders performing full self-attention over the pair and bi-encoders encoding the pair separately. The former gives better prediction quality but is too slow for practical use. In this paper, we propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time. We perform a detailed analysis of this approach on three response retrieval datasets.

Related papers

Triple-Encoders: Representations That Fire Together, Wire Together [51.15206713482718]
Contrastive Learning is a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder. This study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances. We find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models.
arXiv Detail & Related papers (2024-02-19T18:06:02Z)
RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation [30.245143345565758]
We propose a new retrieval-enhanced approach for personalized response generation. We design a hierarchical transformer retriever trained on dialogue domain data to perform personalized retrieval and a context-aware prefix encoder that fuses the retrieved information to the decoder more effectively. We quantitatively evaluate our model's performance under a suite of human and automatic metrics and find it to be superior compared to state-of-the-art baselines on English Reddit conversations.
arXiv Detail & Related papers (2023-06-12T16:10:21Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
Human-Object Interaction Detection via Disentangled Transformer [63.46358684341105]
We present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two sub-tasks. Our method outperforms prior work on two public HOI benchmarks by a sizeable margin.
arXiv Detail & Related papers (2022-04-20T08:15:04Z)
A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model for Multi-turn Dialogue Generation [13.820298189734686]
This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations. Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
arXiv Detail & Related papers (2021-10-13T16:08:29Z)
Building an Efficient and Effective Retrieval-based Dialogue System via Mutual Learning [27.04857039060308]
We propose to combine the best of both worlds to build a retrieval system. We employ a fast bi-encoder to replace the traditional feature-based pre-retrieval model. We train the pre-retrieval model and the re-ranking model at the same time via mutual learning.
arXiv Detail & Related papers (2021-10-01T01:32:33Z)
Question Answering Infused Pre-training of General-Purpose Contextualized Representations [70.62967781515127]
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations. We accomplish this goal by training a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model. We show large improvements over both RoBERTa-large and previous state-of-the-art results on zero-shot and few-shot paraphrase detection.
arXiv Detail & Related papers (2021-06-15T14:45:15Z)
A Template-guided Hybrid Pointer Network for Knowledge-basedTask-oriented Dialogue Systems [15.654119998970499]
We propose a template-guided hybrid pointer network for the knowledge-based task-oriented dialogue system. We design a memory pointer network model with a gating mechanism to fully exploit the semantic correlation between the retrieved answers and the ground-truth response.
arXiv Detail & Related papers (2021-06-10T15:49:26Z)
Improving Response Quality with Backward Reasoning in Open-domain Dialogue Systems [53.160025961101354]
We propose to train the generation model in a bidirectional manner by adding a backward reasoning step to the vanilla encoder-decoder training. The proposed backward reasoning step pushes the model to produce more informative and coherent content. Our method can improve response quality without introducing side information.
arXiv Detail & Related papers (2021-04-30T20:38:27Z)
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system. We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.