Topological Experience Replay
- URL: http://arxiv.org/abs/2203.15845v3
- Date: Mon, 26 Jun 2023 21:12:17 GMT
- Title: Topological Experience Replay
- Authors: Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
- Abstract summary: deep Q-learning methods update Q-values using state transitions sampled from the experience replay buffer.
We organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states.
We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks.
- Score: 22.84244156916668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art deep Q-learning methods update Q-values using state
transition tuples sampled from the experience replay buffer. This strategy
often uniformly and randomly samples or prioritizes data sampling based on
measures such as the temporal difference (TD) error. Such sampling strategies
can be inefficient at learning Q-function because a state's Q-value depends on
the Q-value of successor states. If the data sampling strategy ignores the
precision of the Q-value estimate of the next state, it can lead to useless and
often incorrect updates to the Q-values. To mitigate this issue, we organize
the agent's experience into a graph that explicitly tracks the dependency
between Q-values of states. Each edge in the graph represents a transition
between two states by executing a single action. We perform value backups via a
breadth-first search starting from that expands vertices in the graph starting
from the set of terminal states and successively moving backward. We
empirically show that our method is substantially more data-efficient than
several baselines on a diverse range of goal-reaching tasks. Notably, the
proposed method also outperforms baselines that consume more batches of
training experience and operates from high-dimensional observational data such
as images.
Related papers
- State-Action Similarity-Based Representations for Off-Policy Evaluation [7.428147895832805]
We introduce an OPE-tailored state-action behavioral similarity metric, and use this metric and the fixed dataset to learn an encoder that models this metric.
We show that our state-action representation method boosts the data-efficiency of FQE and OPE error relative to other OPE-based representation learning methods on challenging OPE tasks.
arXiv Detail & Related papers (2023-10-27T18:00:57Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary
State Tracking [55.62705574507595]
OpenPI is the only dataset annotated for open-vocabulary state tracking.
We categorize 3 types of problems on the procedure level, step level and state change level respectively.
For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition.
arXiv Detail & Related papers (2023-06-01T16:48:20Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - State estimation with limited sensors -- A deep learning based approach [0.0]
We propose a novel deep learning based state estimation framework that learns from sequential data.
We illustrate that utilizing sequential data allows for state recovery from only one or two sensors.
arXiv Detail & Related papers (2021-01-27T16:14:59Z) - Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep
Reinforcement Learning [33.31762612175859]
In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates.
We represent these transitions in a data graph and link its structure to soft divergence.
We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem.
arXiv Detail & Related papers (2020-07-15T10:01:32Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.