An agent design with goal reaching guarantees for enhancement of learning
- URL: http://arxiv.org/abs/2405.18118v3
- Date: Wed, 21 Aug 2024 20:43:36 GMT
- Title: An agent design with goal reaching guarantees for enhancement of learning
- Authors: Pavel Osinenko, Grigory Yaremenko, Georgiy Malaniya, Anton Bolychev, Alexander Gepperth,
- Abstract summary: Reinforcement learning is concerned with problems of maximizing accumulated rewards in Markov decision processes.
We suggest an algorithm, which is fairly flexible, and can be used to augment practically any agent as long as it comprises of a critic.
- Score: 40.76517286989928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning is commonly concerned with problems of maximizing accumulated rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the state space attain maximal reward. In such a case, the environment may be considered solved when the goal is reached. Whereas numerous techniques, learning or non-learning based, exist for solving environments, doing so optimally is the biggest challenge. Say, one may choose a reward rate which penalizes the action effort. Reinforcement learning is currently among the most actively developed frameworks for solving environments optimally by virtue of maximizing accumulated reward, in other words, returns. Yet, tuning agents is a notoriously hard task as reported in a series of works. Our aim here is to help the agent learn a near-optimal policy efficiently while ensuring a goal reaching property of some basis policy that merely solves the environment. We suggest an algorithm, which is fairly flexible, and can be used to augment practically any agent as long as it comprises of a critic. A formal proof of a goal reaching property is provided. Comparative experiments on several problems under popular baseline agents provided an empirical evidence that the learning can indeed be boosted while ensuring goal reaching property.
Related papers
- Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Reinforcement Learning with Non-Cumulative Objective [12.906500431427716]
In reinforcement learning, the objective is almost always defined as a emphcumulative function over the rewards along the process.
In this paper, we propose a modification to existing algorithms for optimizing such objectives.
arXiv Detail & Related papers (2023-07-11T01:20:09Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Parametrically Retargetable Decision-Makers Tend To Seek Power [91.93765604105025]
In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive.
We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment.
We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power.
arXiv Detail & Related papers (2022-06-27T17:39:23Z) - Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement
Learning [19.519440854957633]
We propose a new multi-agent policy gradient method called Robust Local Advantage (ROLA) Actor-Critic.
ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity.
We show ROLA's robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
arXiv Detail & Related papers (2021-10-16T19:03:34Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal
Agent [21.548271801592907]
Reinforcement learners are agents that learn to pick actions that lead to high reward.
We show that if an agent is guaranteed to be "asymptotically optimal" in any environment, then subject to an assumption about the true environment, this agent will be either "destroyed" or "incapacitated"
We present an agent, Mentee, with the modest guarantee of approaching the performance of a mentor, doing safe exploration instead of reckless exploration.
arXiv Detail & Related papers (2020-06-05T10:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.