Related papers: Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

URL: http://arxiv.org/abs/2102.05261v2
Date: Thu, 11 Feb 2021 16:49:32 GMT
Title: Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State
Authors: Shi Dong, Benjamin Van Roy, Zhengyuan Zhou
Abstract summary: We design a simple reinforcement learning agent that can operate in any environment. The agent maintains only visitation counts and value estimates for each agent-state-action pair. There is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history.
Score: 35.69801203107371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design a simple reinforcement learning agent that, with a specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. The agent maintains only visitation counts and value estimates for each agent-state-action pair. The value function is updated incrementally in response to temporal differences and optimistic boosts that encourage exploration. The agent executes actions that are greedy with respect to this value function. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time taken to achieve near-optimality is polynomial in the number of agent states and actions, as well as the reward mixing time of the best policy within the reference policy class, which is comprised of those that depend on history only through agent state. Notably, there is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history. Our result sheds light on the potential benefits of (deep) representation learning, which has demonstrated the capability to extract compact and relevant features from high-dimensional interaction histories.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Towards a more efficient computation of individual attribute and policy contribution for post-hoc explanation of cooperative multi-agent systems using Myerson values [0.0]
A quantitative assessment of the global importance of an agent in a team is as valuable as gold for strategists, decision-makers, and sports coaches. We propose a method to determine a Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent System. We test the proposed approach in a proof-of-case environment deploying both hardcoded policies and policies obtained via Deep Reinforcement Learning.
arXiv Detail & Related papers (2022-12-06T15:15:00Z)
Decentralized scheduling through an adaptive, trading-based multi-agent system [1.7403133838762448]
In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. This work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs.
arXiv Detail & Related papers (2022-07-05T13:50:18Z)
Multi-agent Actor-Critic with Time Dynamical Opponent Model [16.820873906787906]
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. We propose a novel textitTime Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We show empirically that TDOM achieves superior opponent behavior prediction during test time.
arXiv Detail & Related papers (2022-04-12T07:16:15Z)
APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC) IAC models the interaction of agents from perspectives of policy and value function. We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z)
Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly. In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously. In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.