Time critic policy gradient methods for traffic signal control in complex and congested scenarios

Stefano Giovanni Rizzo, Giovanna Vantini, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.

Original languageEnglish
Title of host publicationKDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1654-1664
Number of pages11
ISBN (Electronic)9781450362016
DOIs
StatePublished - Jul 25 2019
Externally publishedYes
Event25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States
Duration: Aug 4 2019Aug 8 2019

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
Country/TerritoryUnited States
CityAnchorage
Period08/4/1908/8/19

Keywords

  • Policy gradient
  • Reinforcement learning
  • Roundabout modeling
  • Traffic light control

Fingerprint

Dive into the research topics of 'Time critic policy gradient methods for traffic signal control in complex and congested scenarios'. Together they form a unique fingerprint.

Cite this