Related papers: Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles

Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles

URL: http://arxiv.org/abs/2308.14521v2
Date: Wed, 30 Aug 2023 11:56:45 GMT
Title: Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles
Authors: Nicole Merkle, Ralf Mikut
Abstract summary: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. In order to perform services and carry out activities in a goal-oriented manner, agents require prior knowledge. We propose a novel simulation-based approach that enables the representation of heterogeneous contexts.
Score: 1.124711723767572
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means they operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. However, prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context of an agent determines its choice of actions. Since the environments can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, e.g., agents with reinforcement learning are able to learn policies, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we conducted with the "Virtual Home" dataset indicates that agents with a need to switch seamlessly between different contexts, can request on-demand composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that use reinforcement learning.

Related papers

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning. EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Large language models in climate and sustainability policy: limits and opportunities [1.4843690728082002]
We apply different NLP techniques, tools and approaches to climate and sustainability documents to derive policy-relevant and actionable measures. We find that the use of LLMs is successful at processing, classifying and summarizing heterogeneous text-based data. Our work presents a critical but empirically grounded application of LLMs to complex policy problems and suggests avenues to further expand Artificial Intelligence-powered computational social sciences.
arXiv Detail & Related papers (2025-02-04T10:13:14Z)
I Know How: Combining Prior Policies to Solve New Tasks [17.214443593424498]
Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios. Learning from scratch for each new task is not a viable or sustainable option. We propose a new framework, I Know How, which provides a common formalization.
arXiv Detail & Related papers (2024-06-14T08:44:51Z)
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization. Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning. An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z)
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors. We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z)
Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies [13.410372954752496]
We present an investigation into how context should be incorporated into behaviour learning to improve generalisation. We introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance.
arXiv Detail & Related papers (2023-10-25T14:50:05Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies. We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z)
Policy Supervectors: General Characterization of Agents by their Behaviour [18.488655590845163]
We propose policy supervectors for characterizing agents by the distribution of states they visit. Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine. We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
arXiv Detail & Related papers (2020-12-02T14:43:16Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [116.804536884437]
We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
arXiv Detail & Related papers (2020-04-21T03:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.