Using Cognitive Models to Train Warm Start Reinforcement Learning Agents
for Human-Computer Interactions
- URL: http://arxiv.org/abs/2103.06160v1
- Date: Wed, 10 Mar 2021 16:20:02 GMT
- Title: Using Cognitive Models to Train Warm Start Reinforcement Learning Agents
for Human-Computer Interactions
- Authors: Chao Zhang, Shihan Wang, Henk Aarts and Mehdi Dastani
- Abstract summary: We propose a novel approach of using cognitive models to pre-train RL agents before they are applied to real users.
We present our general methodological approach, followed by two case studies from our previous and ongoing projects.
- Score: 6.623676799228969
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) agents in human-computer interactions
applications require repeated user interactions before they can perform well.
To address this "cold start" problem, we propose a novel approach of using
cognitive models to pre-train RL agents before they are applied to real users.
After briefly reviewing relevant cognitive models, we present our general
methodological approach, followed by two case studies from our previous and
ongoing projects. We hope this position paper stimulates conversations between
RL, HCI, and cognitive science researchers in order to explore the full
potential of the approach.
Related papers
- Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Planning with RL and episodic-memory behavioral priors [0.20305676256390934]
Learning from behavioral priors is a promising way to bootstrap agents with a better-than-random exploration policy.
We present a planning-based approach that can use these behavioral priors for effective exploration and learning in a reinforcement learning environment.
arXiv Detail & Related papers (2022-07-05T07:11:05Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z) - Towards Interactive Reinforcement Learning with Intrinsic Feedback [1.7117805951258132]
Reinforcement learning (RL) and brain-computer interfaces (BCI) have experienced significant growth over the past decade.
With rising interest in human-in-the-loop (HITL), incorporating human input with RL algorithms has given rise to the sub-field of interactive RL.
We denote this new and emerging medium of feedback as intrinsic feedback.
arXiv Detail & Related papers (2021-12-02T19:29:26Z) - Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning
with Counterfactual Explanations [1.8275108630751844]
Human-in-the-loop Reinforcement Learning (HRL) addresses this issue by combining human feedback and reinforcement learning techniques.
We extend the existing TAMER Framework with the possibility to enhance human feedback with two different types of counterfactual explanations.
arXiv Detail & Related papers (2021-08-03T08:27:28Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Accelerating Reinforcement Learning Agent with EEG-based Implicit Human
Feedback [10.138798960466222]
Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning.
Previous methods require human observer to give inputs explicitly, burdening the human in the loop of RL agent's learning process.
We investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP)
arXiv Detail & Related papers (2020-06-30T03:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.