Discovering Intrinsic Reward with Contrastive Random Walk
- URL: http://arxiv.org/abs/2204.10976v1
- Date: Sat, 23 Apr 2022 02:24:38 GMT
- Title: Discovering Intrinsic Reward with Contrastive Random Walk
- Authors: Zixuan Pan, Zihao Wei, Yidong Huang, Aditya Gupta
- Abstract summary: Contrastive Random Walk defines the transition matrix of a random walk with the help of neural networks.
Our method works well in non-tabular sparse reward scenarios.
We also find that adaptive restart and appropriate temperature are crucial to the performance of Contrastive Random Walk.
- Score: 2.5960593866103014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of this paper is to demonstrate the efficacy of using Contrastive
Random Walk as a curiosity method to achieve faster convergence to the optimal
policy.Contrastive Random Walk defines the transition matrix of a random walk
with the help of neural networks. It learns a meaningful state representation
with a closed loop. The loss of Contrastive Random Walk serves as an intrinsic
reward and is added to the environment reward. Our method works well in
non-tabular sparse reward scenarios, in the sense that our method receives the
highest reward within the same iterations compared to other methods. Meanwhile,
Contrastive Random Walk is more robust. The performance doesn't change much
with different random initialization of environments. We also find that
adaptive restart and appropriate temperature are crucial to the performance of
Contrastive Random Walk.
Related papers
- Learning Randomized Algorithms with Transformers [8.556706939126146]
In this paper, we enhance deep neural networks, in particular transformer models, with randomization.
We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner.
arXiv Detail & Related papers (2024-08-20T13:13:36Z) - Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization [39.740287682191884]
In robust Markov decision processes (RMDPs) it is assumed that the reward and the transition dynamics lie in a given uncertainty set.
This so-called rectangularity condition is solely motivated by computational concerns.
We introduce a policy-gradient method and prove its convergence.
arXiv Detail & Related papers (2023-09-03T07:34:26Z) - Random Boxes Are Open-world Object Detectors [71.86454597677387]
We show that classifiers trained with random region proposals achieve state-of-the-art Open-world Object Detection (OWOD)
We propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training.
RandBox significantly outperforms the previous state-of-the-art in all metrics.
arXiv Detail & Related papers (2023-07-17T05:08:32Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization [63.09932240840656]
We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization.
We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
arXiv Detail & Related papers (2021-04-29T14:54:11Z) - Improved device-independent randomness expansion rates using two sided
randomness [3.4376560669160394]
A device-independent randomness expansion protocol aims to take an initial random string and generate a longer one.
We investigate the possible improvement that could be gained using the two-sided randomness.
We also consider a modified protocol in which the input randomness is recycled.
arXiv Detail & Related papers (2021-03-12T19:49:17Z) - Scalable Bayesian Inverse Reinforcement Learning [93.27920030279586]
We introduce Approximate Variational Reward Imitation Learning (AVRIL)
Our method addresses the ill-posed nature of the inverse reinforcement learning problem.
Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods.
arXiv Detail & Related papers (2021-02-12T12:32:02Z) - Interpretable random forest models through forward variable selection [0.0]
We develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function.
We demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands.
arXiv Detail & Related papers (2020-05-11T13:56:49Z) - Vertex-reinforced Random Walk for Network Embedding [42.99597051744645]
We study the fundamental problem of random walk for network embedding.
We introduce an exploitation-exploration mechanism to help the random walk jump out of the stuck set.
Experimental results show that our proposed approach reinforce2vec can outperform state-of-the-art random walk based embedding methods by a large margin.
arXiv Detail & Related papers (2020-02-11T15:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.