Offline Reinforcement Learning for Road Traffic Control
- URL: http://arxiv.org/abs/2201.02381v1
- Date: Fri, 7 Jan 2022 09:55:21 GMT
- Title: Offline Reinforcement Learning for Road Traffic Control
- Authors: Mayuresh Kunjir and Sanjay Chawla
- Abstract summary: We build a model-based learning framework, A-DAC, which infers a Markov Decision Process (MDP) from dataset with pessimistic costs built in to deal with data uncertainties.
A-DAC is evaluated on a complex signalized roundabout using multiple datasets varying in size and in batch collection policy.
- Score: 12.251816544079306
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traffic signal control is an important problem in urban mobility with a
significant potential of economic and environmental impact. While there is a
growing interest in Reinforcement Learning (RL) for traffic control, the work
so far has focussed on learning through interactions which, in practice, is
costly. Instead, real experience data on traffic is available and could be
exploited at minimal costs. Recent progress in offline or batch RL has enabled
just that. Model-based offline RL methods, in particular, have been shown to
generalize to the experience data much better than others. We build a
model-based learning framework, A-DAC, which infers a Markov Decision Process
(MDP) from dataset with pessimistic costs built in to deal with data
uncertainties. The costs are modeled through an adaptive shaping of rewards in
the MDP which provides better regularization of data compared to the prior
related work. A-DAC is evaluated on a complex signalized roundabout using
multiple datasets varying in size and in batch collection policy. The
evaluation results show that it is possible to build high performance control
policies in a data efficient manner using simplistic batch collection policies.
Related papers
- Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling [34.547551367941246]
Real-world data collected from sensors or humans often contains noise and errors.
Traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption.
We propose Robust Decision Transformer (RDT) by incorporating several robust techniques.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - A Fully Data-Driven Approach for Realistic Traffic Signal Control Using
Offline Reinforcement Learning [18.2541182874636]
We propose a fully Data-Driven and simulator-free framework for realistic Traffic Signal Control (D2TSC)
We combine well-established traffic flow theory with machine learning to infer the reward signals from coarse-grained traffic data.
Our approach achieves superior performance over conventional and offline RL baselines, and also enjoys much better real-world applicability.
arXiv Detail & Related papers (2023-11-27T15:29:21Z) - Model-based Trajectory Stitching for Improved Offline Reinforcement
Learning [7.462336024223669]
We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories.
TS introduces unseen actions joining previously disconnected states.
We show that using this data augmentation strategy jointly with behavioural cloning (BC) leads to improvements over the behaviour-cloned policy.
arXiv Detail & Related papers (2022-11-21T16:00:39Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Conservative Data Sharing for Multi-Task Offline Reinforcement Learning [119.85598717477016]
We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks.
We develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.
arXiv Detail & Related papers (2021-09-16T17:34:06Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.