Toward Interpretability of Dual-Encoder Models for Dialogue Response
Suggestions
- URL: http://arxiv.org/abs/2003.04998v1
- Date: Mon, 2 Mar 2020 21:26:06 GMT
- Title: Toward Interpretability of Dual-Encoder Models for Dialogue Response
Suggestions
- Authors: Yitong Li, Dianqi Li, Sushant Prakash and Peng Wang
- Abstract summary: We present an attentive dual encoder model that includes an attention mechanism on top of the extracted word-level features from two encoders.
We design a novel regularization loss to minimize the mutual information between unimportant words and desired labels.
Experiments demonstrate the effectiveness of the proposed model in terms of better Recall@1 accuracy and visualized interpretability.
- Score: 18.117115200484708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work shows how to improve and interpret the commonly used dual encoder
model for response suggestion in dialogue. We present an attentive dual encoder
model that includes an attention mechanism on top of the extracted word-level
features from two encoders, one for context and one for label respectively. To
improve the interpretability in the dual encoder models, we design a novel
regularization loss to minimize the mutual information between unimportant
words and desired labels, in addition to the original attention method, so that
important words are emphasized while unimportant words are de-emphasized. This
can help not only with model interpretability, but can also further improve
model accuracy. We propose an approximation method that uses a neural network
to calculate the mutual information. Furthermore, by adding a residual layer
between raw word embeddings and the final encoded context feature, word-level
interpretability is preserved at the final prediction of the model. We compare
the proposed model with existing methods for the dialogue response task on two
public datasets (Persona and Ubuntu). The experiments demonstrate the
effectiveness of the proposed model in terms of better Recall@1 accuracy and
visualized interpretability.
Related papers
- Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - Can Your Model Tell a Negation from an Implicature? Unravelling
Challenges With Intent Encoders [24.42199777529863]
Large Language Models (LLMs) enable embeddings allowing one to adjust semantics over the embedding space using prompts.
Traditional evaluation benchmarks rely solely on task metrics that don't particularly measure gaps related to semantic understanding.
We propose an intent semantic toolkit that gives a more holistic view of intent embedding models.
arXiv Detail & Related papers (2024-03-07T08:32:17Z) - Co-guiding for Multi-intent Spoken Language Understanding [53.30511968323911]
We propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks.
For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning.
Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin.
arXiv Detail & Related papers (2023-11-22T08:06:22Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Distilled Dual-Encoder Model for Vision-Language Understanding [50.42062182895373]
We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks.
We show that applying the cross-modal attention distillation for both pre-training and fine-tuning stages achieves further improvements.
arXiv Detail & Related papers (2021-12-16T09:21:18Z) - Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model.
We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z) - Cross Modification Attention Based Deliberation Model for Image
Captioning [11.897899189552318]
We propose a universal two-pass decoding framework for image captioning.
A single-pass decoding based model first generates a draft caption according to an input image.
A Deliberation Model then performs the polishing process to refine the draft caption to a better image description.
arXiv Detail & Related papers (2021-09-17T08:38:08Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.