Assessment of Reward Functions for Reinforcement Learning Traffic Signal
Control under Real-World Limitations
- URL: http://arxiv.org/abs/2008.11634v2
- Date: Mon, 12 Oct 2020 16:00:15 GMT
- Title: Assessment of Reward Functions for Reinforcement Learning Traffic Signal
Control under Real-World Limitations
- Authors: Alvaro Cabrejas-Egea, Shaun Howell, Maksis Knutins and Colm
Connaughton
- Abstract summary: This paper compares the performance of agents using different reward functions in a simulation of a junction in Greater Manchester, UK.
We find that speed maximisation resulted in the lowest average waiting times across all demand levels, displaying significantly better performance than other rewards previously introduced in the literature.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive traffic signal control is one key avenue for mitigating the growing
consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS
require regular and time-consuming calibration, can't optimise well for
multiple road use modalities, and require the manual curation of many
implementation plans. A recent alternative to these approaches are deep
reinforcement learning algorithms, in which an agent learns how to take the
most appropriate action for a given state of the system. This is guided by
neural networks approximating a reward function that provides feedback to the
agent regarding the performance of the actions taken, making it sensitive to
the specific reward function chosen. Several authors have surveyed the reward
functions used in the literature, but attributing outcome differences to reward
function choice across works is problematic as there are many uncontrolled
differences, as well as different outcome metrics. This paper compares the
performance of agents using different reward functions in a simulation of a
junction in Greater Manchester, UK, across various demand profiles, subject to
real world constraints: realistic sensor inputs, controllers, calibrated
demand, intergreen times and stage sequencing. The reward metrics considered
are based on the time spent stopped, lost time, change in lost time, average
speed, queue length, junction throughput and variations of these magnitudes.
The performance of these reward functions is compared in terms of total waiting
time. We find that speed maximisation resulted in the lowest average waiting
times across all demand levels, displaying significantly better performance
than other rewards previously introduced in the literature.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - A Comparative Study of Loss Functions: Traffic Predictions in Regular
and Congestion Scenarios [0.0]
We explore various loss functions inspired by heavy tail analysis and imbalanced classification problems to address this issue.
We discover that when optimizing for Mean Absolute Error (MAE), the MAE-Focal Loss function stands out as the most effective.
This research enhances deep learning models' capabilities in forecasting sudden speed changes due to congestion.
arXiv Detail & Related papers (2023-08-29T17:44:02Z) - Dynamic Decision Frequency with Continuous Options [11.83290684845269]
In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals.
We propose a framework called Continuous-Time Continuous-Options (CTCO) where the agent chooses options as sub-policies of variable durations.
We show that our algorithm's performance is not affected by the choice of environment interaction frequency.
arXiv Detail & Related papers (2022-12-06T19:51:12Z) - Cooperative Reinforcement Learning on Traffic Signal Control [3.759936323189418]
Traffic signal control is a challenging real-world problem aiming to minimize overall travel time by coordinating vehicle movements at road intersections.
Existing traffic signal control systems in use still rely heavily on oversimplified information and rule-based methods.
This paper proposes a cooperative, multi-objective architecture with age-decaying weights to better estimate multiple reward terms for traffic signal control optimization.
arXiv Detail & Related papers (2022-05-23T13:25:15Z) - AI-aided Traffic Control Scheme for M2M Communications in the Internet
of Vehicles [61.21359293642559]
The dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies.
We consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it.
arXiv Detail & Related papers (2022-03-05T10:54:05Z) - AutoLoss: Automated Loss Function Search in Recommendations [34.27873944762912]
We propose an AutoLoss framework that can automatically and adaptively search for the appropriate loss function from a set of candidates.
Unlike existing algorithms, the proposed controller can adaptively generate the loss probabilities for different data examples according to their varied convergence behaviors.
arXiv Detail & Related papers (2021-06-12T08:15:00Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks.
We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit)
We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z) - Assessment of Reward Functions in Reinforcement Learning for Multi-Modal
Urban Traffic Control under Real-World limitations [0.0]
This paper robustly evaluates 30 different Reinforcement Learning reward functions for controlling intersections serving pedestrians and vehicles.
We use a calibrated model in terms of demand, sensors, green times and other operational constraints of a real intersection in Greater Manchester, UK.
arXiv Detail & Related papers (2020-10-17T16:20:33Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.