Measuring Exploration in Reinforcement Learning via Optimal Transport in
Policy Space
- URL: http://arxiv.org/abs/2402.09113v1
- Date: Wed, 14 Feb 2024 11:55:50 GMT
- Title: Measuring Exploration in Reinforcement Learning via Optimal Transport in
Policy Space
- Authors: Reabetswe M. Nkhumise, Debabrota Basu, Tony J. Prescott, Aditya Gilra
- Abstract summary: We quantify and compare the amount of exploration and learning accomplished by a Reinforcement Learning (RL) algorithm.
Specifically, we propose a novel measure, named Exploration Index, that quantifies the relative effort of knowledge transfer (transferability) by an RL algorithm in comparison to supervised learning (SL)
The comparison is established by formulating learning in RL as a sequence of SL tasks, and using optimal transport based metrics to compare the total path traversed by the RL and SL algorithms in the data distribution space.
- Score: 9.208078107007942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploration is the key ingredient of reinforcement learning (RL) that
determines the speed and success of learning. Here, we quantify and compare the
amount of exploration and learning accomplished by a Reinforcement Learning
(RL) algorithm. Specifically, we propose a novel measure, named Exploration
Index, that quantifies the relative effort of knowledge transfer
(transferability) by an RL algorithm in comparison to supervised learning (SL)
that transforms the initial data distribution of RL to the corresponding final
data distribution. The comparison is established by formulating learning in RL
as a sequence of SL tasks, and using optimal transport based metrics to compare
the total path traversed by the RL and SL algorithms in the data distribution
space. We perform extensive empirical analysis on various environments and with
multiple algorithms to demonstrate that the exploration index yields insights
about the exploration behaviour of any RL algorithm, and also allows us to
compare the exploratory behaviours of different RL algorithms.
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Transferred Q-learning [79.79659145328856]
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
arXiv Detail & Related papers (2022-02-09T20:08:19Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - PBCS : Efficient Exploration and Exploitation Using a Synergy between
Reinforcement Learning and Motion Planning [8.176152440971897]
"Plan, Backplay, Chain Skills" combines motion planning and reinforcement learning to solve hard exploration environments.
We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes.
arXiv Detail & Related papers (2020-04-24T11:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.