Towards customizable reinforcement learning agents: Enabling preference
specification through online vocabulary expansion
- URL: http://arxiv.org/abs/2210.15096v1
- Date: Thu, 27 Oct 2022 00:54:14 GMT
- Title: Towards customizable reinforcement learning agents: Enabling preference
specification through online vocabulary expansion
- Authors: Utkarsh Soni, Sarath Sreedharan, Mudit Verma, Lin Guan, Matthew
Marquez, Subbarao Kambhampati
- Abstract summary: We propose PRESCA, a system that allows users to specify their preferences in terms of concepts that they understand.
We evaluate PRESCA by using it on a Minecraft environment and show that it can be effectively used to make the agent align with the user's preference.
- Score: 25.053927377536905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing interest in developing automated agents that can work
alongside humans. In addition to completing the assigned task, such an agent
will undoubtedly be expected to behave in a manner that is preferred by the
human. This requires the human to communicate their preferences to the agent.
To achieve this, the current approaches either require the users to specify the
reward function or the preference is interactively learned from queries that
ask the user to compare trajectories. The former approach can be challenging if
the internal representation used by the agent is inscrutable to the human while
the latter is unnecessarily cumbersome for the user if their preference can be
specified more easily in symbolic terms. In this work, we propose PRESCA
(PREference Specification through Concept Acquisition), a system that allows
users to specify their preferences in terms of concepts that they understand.
PRESCA maintains a set of such concepts in a shared vocabulary. If the relevant
concept is not in the shared vocabulary, then it is learned. To make learning a
new concept more efficient, PRESCA leverages causal associations between the
target concept and concepts that are already known. Additionally, the effort of
learning the new concept is amortized by adding the concept to the shared
vocabulary for supporting preference specification in future interactions. We
evaluate PRESCA by using it on a Minecraft environment and show that it can be
effectively used to make the agent align with the user's preference.
Related papers
- SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm.
We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements.
Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z) - Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation [55.5687800992432]
We propose a plug-and-play framework that synergizes Large Language Models (LLMs) and Knowledge Graphs (KGs) to unveil user preferences.
This enables the LLM to transform KG entities into concise natural language descriptions, allowing them to comprehend domain-specific knowledge.
arXiv Detail & Related papers (2024-11-16T11:47:21Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z) - To the Noise and Back: Diffusion for Shared Autonomy [2.341116149201203]
We present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models.
Our framework learns a distribution over a space of desired behaviors.
It then employs a diffusion model to translate the user's actions to a sample from this distribution.
arXiv Detail & Related papers (2023-02-23T18:58:36Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Relative Behavioral Attributes: Filling the Gap between Symbolic Goal
Specification and Reward Learning from Human Preferences [19.70421486855437]
Non-expert users can express complex objectives by expressing preferences over short clips of agent behaviors.
Relative Behavioral Attributes acts as a middle ground between exact goal specification and reward learning purely from preference labels.
We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips.
arXiv Detail & Related papers (2022-10-28T05:25:23Z) - Zero-Shot Prompting for Implicit Intent Prediction and Recommendation
with Commonsense Reasoning [28.441725610692714]
This paper proposes a framework of multi-domain dialogue systems, which can automatically infer implicit intents based on user utterances.
The proposed framework is demonstrated effective to realize implicit intents and recommend associated bots in a zero-shot manner.
arXiv Detail & Related papers (2022-10-12T03:33:49Z) - Discovering Personalized Semantics for Soft Attributes in Recommender
Systems using Concept Activation Vectors [34.56323846959459]
Interactive recommender systems allow users to express intent, preferences, constraints, and contexts in a richer fashion.
One challenge is inferring a user's semantic intent from the open-ended terms or attributes often used to describe a desired item.
We develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems.
arXiv Detail & Related papers (2022-02-06T18:45:15Z) - From Implicit to Explicit feedback: A deep neural network for modeling
sequential behaviours and long-short term preferences of online users [3.464871689508835]
Implicit and explicit feedback have different roles for a useful recommendation.
We go from the hypothesis that a user's preference at a time is a combination of long-term and short-term interests.
arXiv Detail & Related papers (2021-07-26T16:59:20Z) - A Neural Topical Expansion Framework for Unstructured Persona-oriented
Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content.
PEE consists of two main modules: persona exploration and persona exploitation.
Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.