Context-Aware Composition of Agent Policies by Markov Decision Process
Entity Embeddings and Agent Ensembles
- URL: http://arxiv.org/abs/2308.14521v2
- Date: Wed, 30 Aug 2023 11:56:45 GMT
- Title: Context-Aware Composition of Agent Policies by Markov Decision Process
Entity Embeddings and Agent Ensembles
- Authors: Nicole Merkle, Ralf Mikut
- Abstract summary: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts.
In order to perform services and carry out activities in a goal-oriented manner, agents require prior knowledge.
We propose a novel simulation-based approach that enables the representation of heterogeneous contexts.
- Score: 1.124711723767572
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Computational agents support humans in many areas of life and are therefore
found in heterogeneous contexts. This means they operate in rapidly changing
environments and can be confronted with huge state and action spaces. In order
to perform services and carry out activities in a goal-oriented manner, agents
require prior knowledge and therefore have to develop and pursue
context-dependent policies. However, prescribing policies in advance is limited
and inflexible, especially in dynamically changing environments. Moreover, the
context of an agent determines its choice of actions. Since the environments
can be stochastic and complex in terms of the number of states and feasible
actions, activities are usually modelled in a simplified way by Markov decision
processes so that, e.g., agents with reinforcement learning are able to learn
policies, that help to capture the context and act accordingly to optimally
perform activities. However, training policies for all possible contexts using
reinforcement learning is time-consuming. A requirement and challenge for
agents is to learn strategies quickly and respond immediately in cross-context
environments and applications, e.g., the Internet, service robotics,
cyber-physical systems. In this work, we propose a novel simulation-based
approach that enables a) the representation of heterogeneous contexts through
knowledge graphs and entity embeddings and b) the context-aware composition of
policies on demand by ensembles of agents running in parallel. The evaluation
we conducted with the "Virtual Home" dataset indicates that agents with a need
to switch seamlessly between different contexts, can request on-demand composed
policies that lead to the successful completion of context-appropriate
activities without having to learn these policies in lengthy training steps and
episodes, in contrast to agents that use reinforcement learning.
Related papers
- I Know How: Combining Prior Policies to Solve New Tasks [17.214443593424498]
Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios.
Learning from scratch for each new task is not a viable or sustainable option.
We propose a new framework, I Know How, which provides a common formalization.
arXiv Detail & Related papers (2024-06-14T08:44:51Z) - Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization.
Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors.
We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - Dynamics Generalisation in Reinforcement Learning via Adaptive
Context-Aware Policies [13.410372954752496]
We present an investigation into how context should be incorporated into behaviour learning to improve generalisation.
We introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information.
We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance.
arXiv Detail & Related papers (2023-10-25T14:50:05Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z) - Policy Supervectors: General Characterization of Agents by their
Behaviour [18.488655590845163]
We propose policy supervectors for characterizing agents by the distribution of states they visit.
Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine.
We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
arXiv Detail & Related papers (2020-12-02T14:43:16Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [116.804536884437]
We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues.
We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
arXiv Detail & Related papers (2020-04-21T03:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.