Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable
Reward Function
- URL: http://arxiv.org/abs/2303.13797v1
- Date: Fri, 24 Mar 2023 04:33:40 GMT
- Title: Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable
Reward Function
- Authors: A.B. Siddique, M.H. Maqbool, Kshitija Taywade, Hassan Foroosh
- Abstract summary: We propose a novel framework, P-ToD, to personalize task-oriented dialog systems.
P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases.
Our novel reward function can quantify the quality of the generated responses even for unseen profiles.
- Score: 19.652303125864204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task-oriented dialog systems enable users to accomplish tasks using natural
language. State-of-the-art systems respond to users in the same way regardless
of their personalities, although personalizing dialogues can lead to higher
levels of adoption and better user experiences. Building personalized dialog
systems is an important, yet challenging endeavor and only a handful of works
took on the challenge. Most existing works rely on supervised learning
approaches and require laborious and expensive labeled training data for each
user profile. Additionally, collecting and labeling data for each user profile
is virtually impossible. In this work, we propose a novel framework, P-ToD, to
personalize task-oriented dialog systems capable of adapting to a wide range of
user profiles in an unsupervised fashion using a zero-shot generalizable reward
function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three
phases. Phase one performs task-specific training. Phase two kicks off
unsupervised personalization by leveraging the proximal policy optimization
algorithm that performs policy gradients guided by the zero-shot generalizable
reward function. Our novel reward function can quantify the quality of the
generated responses even for unseen profiles. The optional final phase
fine-tunes the personalized model using a few labeled training examples. We
conduct extensive experimental analysis using the personalized bAbI dialogue
benchmark for five tasks and up to 180 diverse user profiles. The experimental
results demonstrate that P-ToD, even when it had access to zero labeled
examples, outperforms state-of-the-art supervised personalization models and
achieves competitive performance on BLEU and ROUGE metrics when compared to a
strong fully-supervised GPT-2 baseline
Related papers
- Personalized Visual Instruction Tuning [30.677058613937067]
multimodal large language models (MLLMs) can engage in general conversations but fail to conduct personalized dialogues targeting at specific individuals.
This deficiency hinders the application of MLLMs in personalized settings, such as tailored visual assistants on mobile devices.
We introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image.
arXiv Detail & Related papers (2024-10-09T17:46:53Z) - PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization [9.594958534074074]
We introduce the PEFT-U Benchmark: a new dataset for building and evaluating NLP models for user personalization.
We explore the challenge of efficiently personalizing LLMs to accommodate user-specific preferences in the context of diverse user-centered tasks.
arXiv Detail & Related papers (2024-07-25T14:36:18Z) - Step-Back Profiling: Distilling User History for Personalized Scientific Writing [50.481041470669766]
Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals.
We introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles.
Our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark.
arXiv Detail & Related papers (2024-06-20T12:58:26Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios.
Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system.
We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z) - Task-Optimized Adapters for an End-to-End Task-Oriented Dialogue System [0.0]
We propose an End-to-end TOD system with Task-d Adapters which learn independently per task, adding only small number of parameters after fixed layers of pre-trained network.
Our method is a model-agnostic approach and does not require prompt-tuning as only input data without a prompt.
arXiv Detail & Related papers (2023-05-04T00:17:49Z) - MCP: Self-supervised Pre-training for Personalized Chatbots with
Multi-level Contrastive Sampling [18.40883902610959]
We propose a self-supervised learning framework for capturing better representations from users' dialogue history for personalized chatbots.
Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history.
Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
arXiv Detail & Related papers (2022-10-17T05:16:23Z) - A Cooperative Memory Network for Personalized Task-oriented Dialogue
Systems with Incomplete User Profiles [55.951126447217526]
We study personalized Task-oriented Dialogue Systems without assuming that user profiles are complete.
We propose a Cooperative Memory Network (CoMemNN) that has a novel mechanism to gradually enrich user profiles.
CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3.06% in terms of response selection accuracy.
arXiv Detail & Related papers (2021-02-16T18:05:54Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.