$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of
Information-Seeking Dialogue via Behavioural Fine-Tuning
- URL: http://arxiv.org/abs/2311.09800v2
- Date: Mon, 4 Mar 2024 14:35:59 GMT
- Title: $\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of
Information-Seeking Dialogue via Behavioural Fine-Tuning
- Authors: Evgeniia Razumovskaia, Ivan Vuli\'c, Pavle Markovi\'c, Tomasz Cichy,
Qian Zheng, Tsung-Hsien Wen, Pawe{\l} Budzianowski
- Abstract summary: We introduce BeInfo, a method that applies behavioural tuning to aid information-seeking dialogue systems.
We show that models tuned with BeInfo become considerably more faithful to the knowledge source.
We also show that the models with 3B parameters tuned with BeInfo demonstrate strong performance on data from real production' conversations.
- Score: 55.96744451743273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factuality is a crucial requirement in information seeking dialogue: the
system should respond to the user's queries so that the responses are
meaningful and aligned with the knowledge provided to the system. However, most
modern large language models suffer from hallucinations, that is, they generate
responses not supported by or contradicting the knowledge source. To mitigate
the issue and increase faithfulness of information-seeking dialogue systems, we
introduce BeInfo, a simple yet effective method that applies behavioural tuning
to aid information-seeking dialogue. Relying on three standard datasets, we
show that models tuned with BeInfo} become considerably more faithful to the
knowledge source both for datasets and domains seen during BeInfo-tuning, as
well as on unseen domains, when applied in a zero-shot manner. In addition, we
show that the models with 3B parameters (e.g., Flan-T5) tuned with BeInfo
demonstrate strong performance on data from real `production' conversations and
outperform GPT4 when tuned on a limited amount of such realistic in-domain
dialogues.
Related papers
- Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets [9.78470355087662]
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction.
This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations.
arXiv Detail & Related papers (2024-06-19T06:59:57Z) - Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive
Learning [71.8876256714229]
We propose an entity-based contrastive learning framework for improving the robustness of knowledge-grounded dialogue systems.
Our method achieves new state-of-the-art performance in terms of automatic evaluation scores.
arXiv Detail & Related papers (2024-01-09T05:16:52Z) - Improving Factual Consistency for Knowledge-Grounded Dialogue Systems
via Knowledge Enhancement and Alignment [77.56326872997407]
Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate responses that are factually inconsistent with the provided knowledge source.
Inspired by previous work which identified that feed-forward networks (FFNs) within Transformers are responsible for factual knowledge expressions, we investigate two methods to efficiently improve the factual expression capability.
arXiv Detail & Related papers (2023-10-12T14:44:05Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z) - Language Model as an Annotator: Exploring DialoGPT for Dialogue
Summarization [29.887562761942114]
We show how DialoGPT, a pre-trained model for conversational response generation, can be developed as an unsupervised dialogue annotator.
We apply DialoGPT to label three types of features on two dialogue summarization datasets, SAMSum and AMI, and employ pre-trained and non pre-trained models as our summarizes.
arXiv Detail & Related papers (2021-05-26T13:50:13Z) - Robustness Testing of Language Understanding in Dialog Systems [33.30143655553583]
We conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models.
We introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation.
We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems.
arXiv Detail & Related papers (2020-12-30T18:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.