Large Language Models Meet Harry Potter: A Bilingual Dataset for
Aligning Dialogue Agents with Characters
- URL: http://arxiv.org/abs/2211.06869v4
- Date: Mon, 9 Oct 2023 05:08:23 GMT
- Title: Large Language Models Meet Harry Potter: A Bilingual Dataset for
Aligning Dialogue Agents with Characters
- Authors: Nuo Chen, Yan Wang, Haiyun Jiang, Deng Cai, Yuhan Li, Ziyang Chen,
Longyue Wang and Jia Li
- Abstract summary: We introduce the Harry Potter Dialogue dataset, designed to advance the study of dialogue agents and character alignment.
The dataset encompasses all dialogue sessions (in both English and Chinese) from the Harry Potter series.
It is annotated with vital background information, including dialogue scenes, speakers, character relationships, and attributes.
- Score: 70.84938803753062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Dialogue-style Large Language Models (LLMs) such as ChatGPT
and GPT4 have demonstrated immense potential in constructing open-domain
dialogue agents. However, aligning these agents with specific characters or
individuals remains a considerable challenge due to the complexities of
character representation and the lack of comprehensive annotations. In this
paper, we introduce the Harry Potter Dialogue (HPD) dataset, designed to
advance the study of dialogue agents and character alignment. The dataset
encompasses all dialogue sessions (in both English and Chinese) from the Harry
Potter series and is annotated with vital background information, including
dialogue scenes, speakers, character relationships, and attributes. These
extensive annotations may empower LLMs to unlock character-driven dialogue
capabilities. Furthermore, it can serve as a universal benchmark for evaluating
how well can a LLM aligning with a specific character. We benchmark LLMs on HPD
using both fine-tuning and in-context learning settings. Evaluation results
reveal that although there is substantial room for improvement in generating
high-quality, character-aligned responses, the proposed dataset is valuable in
guiding models toward responses that better align with the character of Harry
Potter.
Related papers
- What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models [0.0]
We introduce a dialogue filler framework that utilizes large language models (LLMs) to generate dynamic and contextually appropriate character interactions.
We test this framework within the environments of Final Fantasy VII Remake and Pokemon.
This study aims to assist developers in crafting more nuanced filler dialogues, thereby enriching player immersion and enhancing the overall RPG experience.
arXiv Detail & Related papers (2024-07-29T19:12:18Z) - PersonalityChat: Conversation Distillation for Personalized Dialog
Modeling with Facts and Traits [5.447308344436046]
PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset.
We show that the personality trait labels can be used for trait-based personalization of generative dialogue models.
arXiv Detail & Related papers (2024-01-14T20:35:33Z) - CharacterGLM: Customizing Chinese Conversational AI Characters with
Large Language Models [66.4382820107453]
We present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters.
Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs.
arXiv Detail & Related papers (2023-11-28T14:49:23Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.