Related papers: Topological Experience Replay

Topological Experience Replay

URL: http://arxiv.org/abs/2203.15845v3
Date: Mon, 26 Jun 2023 21:12:17 GMT
Title: Topological Experience Replay
Authors: Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
Abstract summary: deep Q-learning methods update Q-values using state transitions sampled from the experience replay buffer. We organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks.
Score: 22.84244156916668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer. This strategy often uniformly and randomly samples or prioritizes data sampling based on measures such as the temporal difference (TD) error. Such sampling strategies can be inefficient at learning Q-function because a state's Q-value depends on the Q-value of successor states. If the data sampling strategy ignores the precision of the Q-value estimate of the next state, it can lead to useless and often incorrect updates to the Q-values. To mitigate this issue, we organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states. Each edge in the graph represents a transition between two states by executing a single action. We perform value backups via a breadth-first search starting from that expands vertices in the graph starting from the set of terminal states and successively moving backward. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience and operates from high-dimensional observational data such as images.

Related papers

Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models [6.250351952912199]
We analyze the extent to which multimodal retrieval-augmented VLMs training data compared to baseline VLMs.<n>We find that finetuned models rely more heavily on memorization than retrieval-augmented VLMs.<n>We present the first empirical comparison of the parametric effect between text and visual modalities.
arXiv Detail & Related papers (2025-02-19T15:58:09Z)
Contrastive Difference Predictive Coding [79.74052624853303]
We introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events. We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL.
arXiv Detail & Related papers (2023-10-31T03:16:32Z)
State-Action Similarity-Based Representations for Off-Policy Evaluation [7.428147895832805]
We introduce an OPE-tailored state-action behavioral similarity metric, and use this metric and the fixed dataset to learn an encoder that models this metric. We show that our state-action representation method boosts the data-efficiency of FQE and OPE error relative to other OPE-based representation learning methods on challenging OPE tasks.
arXiv Detail & Related papers (2023-10-27T18:00:57Z)
Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z)
OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary State Tracking [55.62705574507595]
OpenPI is the only dataset annotated for open-vocabulary state tracking. We categorize 3 types of problems on the procedure level, step level and state change level respectively. For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition.
arXiv Detail & Related papers (2023-06-01T16:48:20Z)
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy. We propose an offline RL method that never needs to evaluate actions outside of the dataset. This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)
State estimation with limited sensors -- A deep learning based approach [0.0]
We propose a novel deep learning based state estimation framework that learns from sequential data. We illustrate that utilizing sequential data allows for state recovery from only one or two sensors.
arXiv Detail & Related papers (2021-01-27T16:14:59Z)
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning [33.31762612175859]
In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. We represent these transitions in a data graph and link its structure to soft divergence. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem.
arXiv Detail & Related papers (2020-07-15T10:01:32Z)
Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA) First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA) Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.