Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
- URL: http://arxiv.org/abs/2205.15824v1
- Date: Tue, 31 May 2022 14:26:00 GMT
- Title: Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
- Authors: Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rockt\"aschel, Edward
Grefenstette
- Abstract summary: A key to data-efficient RL is good value estimation, but current methods fail to fully utilise the structure of the trajectory data gathered from the environment.
In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation.
Our method, when combined with popular value-based methods, provides improved performance over one-step and multi-step methods on a suite of data-efficient RL benchmarks.
- Score: 24.765707880860543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The successes of deep Reinforcement Learning (RL) are limited to settings
where we have a large stream of online experiences, but applying RL in the
data-efficient setting with limited access to online interactions is still
challenging. A key to data-efficient RL is good value estimation, but current
methods in this space fail to fully utilise the structure of the trajectory
data gathered from the environment. In this paper, we treat the transition data
of the MDP as a graph, and define a novel backup operator, Graph Backup, which
exploits this graph structure for better value estimation. Compared to
multi-step backup methods such as $n$-step $Q$-Learning and TD($\lambda$),
Graph Backup can perform counterfactual credit assignment and gives stable
value estimates for a state regardless of which trajectory the state is sampled
from. Our method, when combined with popular value-based methods, provides
improved performance over one-step and multi-step methods on a suite of
data-efficient RL benchmarks including MiniGrid, Minatar and Atari100K. We
further analyse the reasons for this performance boost through a novel
visualisation of the transition graphs of Atari games.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs [27.169892145194638]
GraphCLIP is a framework to learn graph foundation models with strong cross-domain zero/few-shot transferability.
We generate and curate large-scale graph-summary pair data with the assistance of LLMs.
For few-shot learning, we propose a novel graph prompt tuning technique aligned with our pretraining objective.
arXiv Detail & Related papers (2024-10-14T09:40:52Z) - GLBench: A Comprehensive Benchmark for Graph with Large Language Models [41.89444363336435]
We introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios.
GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks.
arXiv Detail & Related papers (2024-07-10T08:20:47Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions [34.99034454081842]
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited.
In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability.
We also introduce LogIX, a software package that can transform existing training code into data valuation code with minimal effort.
arXiv Detail & Related papers (2024-05-22T19:39:05Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching [50.30124426442228]
Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns.
We propose a novel graph method named textbfCraftextbfTing textbfRationatextbf (textbfCTRL) which offers an optimized starting point closer to the original dataset's feature distribution.
arXiv Detail & Related papers (2024-02-07T14:49:10Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Interpretable performance analysis towards offline reinforcement
learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms.
We explore the correlation between the performance of different types of algorithms and the distribution of actions under states.
We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.