Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
- URL: http://arxiv.org/abs/2501.12698v2
- Date: Sat, 25 Jan 2025 17:29:06 GMT
- Title: Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
- Authors: Kai Yoshida, Masahiro Mizukami, Seiya Kawano, Canasai Kruengkrai, Hiroaki Sugiyama, Koichiro Yoshino,
- Abstract summary: This study prepared reward models corresponding to 12 metrics related to the impression of the entire dialogue for evaluating dialogue responses.
We tuned our dialogue models using the reward model signals as feedback to improve the impression of the system.
- Score: 9.005722141359675
- License:
- Abstract: To improve user engagement during conversations with dialogue systems, we must improve individual dialogue responses and dialogue impressions such as consistency, personality, and empathy throughout the entire dialogue. While such dialogue systems have been developing rapidly with the help of large language models (LLMs), reinforcement learning from AI feedback (RLAIF) has attracted attention to align LLM-based dialogue models for such dialogue impressions. In RLAIF, a reward model based on another LLM is used to create a training signal for an LLM-based dialogue model using zero-shot/few-shot prompting techniques. However, evaluating an entire dialogue only by prompting LLMs is challenging. In this study, the supervised fine-tuning (SFT) of LLMs prepared reward models corresponding to 12 metrics related to the impression of the entire dialogue for evaluating dialogue responses. We tuned our dialogue models using the reward model signals as feedback to improve the impression of the system. The results of automatic and human evaluations showed that tuning the dialogue model using our reward model corresponding to dialogue impression improved the evaluation of individual metrics and the naturalness of the dialogue response.
Related papers
- DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling [73.08187964426823]
Large language models (LLMs) enabled dialogue systems have become one of the central modes in human-machine interaction.
This paper introduces a new research task--$textbfD$ialogue $textbfE$lement $textbfMO$deling.
We propose a novel benchmark, $textbfDEMO$, designed for a comprehensive dialogue modeling and assessment.
arXiv Detail & Related papers (2024-12-06T10:01:38Z) - OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency.
Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - DialogBench: Evaluating LLMs as Human-like Dialogue Systems [16.997134341787486]
Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning.
In this paper, we propose DialogBench, a dialogue evaluation benchmark that contains 12 dialogue tasks.
We show that instruction tuning improves the human likeness of LLMs to a certain extent, but most LLMs still have much room for improvement as human-like dialogue systems.
arXiv Detail & Related papers (2023-11-03T02:59:56Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension [42.57581945778631]
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing.
We propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization.
We show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
arXiv Detail & Related papers (2022-12-24T04:39:54Z) - Post-Training Dialogue Summarization using Pseudo-Paraphrasing [12.083992819138716]
We propose to post-train pretrained language models (PLMs) to rephrase from dialogue to narratives.
Comprehensive experiments show that our approach significantly improves vanilla PLMs on dialogue summarization.
arXiv Detail & Related papers (2022-04-28T13:42:19Z) - Response Generation with Context-Aware Prompt Learning [19.340498579331555]
We present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task.
Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts.
Our approach significantly outperforms the fine-tuning baseline and the generic prompt-learning methods.
arXiv Detail & Related papers (2021-11-04T05:40:13Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - Modeling and Utilizing User's Internal State in Movie Recommendation
Dialogue [17.87695990289955]
We model the user's internal state (UIS) in dialogues and construct a dialogue system that changes its response based on the UIS.
We train the UIS estimators on a dialogue corpus with the modeled UIS's annotations.
We also design response change rules that change the system's responses according to each UIS.
arXiv Detail & Related papers (2020-12-05T20:50:53Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.