Offline Reinforcement Learning for Road Traffic Control
- URL: http://arxiv.org/abs/2201.02381v1
- Date: Fri, 7 Jan 2022 09:55:21 GMT
- Title: Offline Reinforcement Learning for Road Traffic Control
- Authors: Mayuresh Kunjir and Sanjay Chawla
- Abstract summary: We build a model-based learning framework, A-DAC, which infers a Markov Decision Process (MDP) from dataset with pessimistic costs built in to deal with data uncertainties.
A-DAC is evaluated on a complex signalized roundabout using multiple datasets varying in size and in batch collection policy.
- Score: 12.251816544079306
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traffic signal control is an important problem in urban mobility with a
significant potential of economic and environmental impact. While there is a
growing interest in Reinforcement Learning (RL) for traffic control, the work
so far has focussed on learning through interactions which, in practice, is
costly. Instead, real experience data on traffic is available and could be
exploited at minimal costs. Recent progress in offline or batch RL has enabled
just that. Model-based offline RL methods, in particular, have been shown to
generalize to the experience data much better than others. We build a
model-based learning framework, A-DAC, which infers a Markov Decision Process
(MDP) from dataset with pessimistic costs built in to deal with data
uncertainties. The costs are modeled through an adaptive shaping of rewards in
the MDP which provides better regularization of data compared to the prior
related work. A-DAC is evaluated on a complex signalized roundabout using
multiple datasets varying in size and in batch collection policy. The
evaluation results show that it is possible to build high performance control
policies in a data efficient manner using simplistic batch collection policies.
Related papers
- OffRIPP: Offline RL-based Informative Path Planning [12.705099730591671]
IPP is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment.
We propose an offline RL-based IPP framework that optimized information gain without requiring real-time interaction during training.
We validate the framework through extensive simulations and real-world experiments.
arXiv Detail & Related papers (2024-09-25T11:30:59Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - A Fully Data-Driven Approach for Realistic Traffic Signal Control Using
Offline Reinforcement Learning [18.2541182874636]
We propose a fully Data-Driven and simulator-free framework for realistic Traffic Signal Control (D2TSC)
We combine well-established traffic flow theory with machine learning to infer the reward signals from coarse-grained traffic data.
Our approach achieves superior performance over conventional and offline RL baselines, and also enjoys much better real-world applicability.
arXiv Detail & Related papers (2023-11-27T15:29:21Z) - Model-based Trajectory Stitching for Improved Offline Reinforcement
Learning [7.462336024223669]
We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories.
TS introduces unseen actions joining previously disconnected states.
We show that using this data augmentation strategy jointly with behavioural cloning (BC) leads to improvements over the behaviour-cloned policy.
arXiv Detail & Related papers (2022-11-21T16:00:39Z) - Conservative Data Sharing for Multi-Task Offline Reinforcement Learning [119.85598717477016]
We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks.
We develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.
arXiv Detail & Related papers (2021-09-16T17:34:06Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.