A Comparison of Reinforcement Learning Agents Applied to Traffic Signal Optimisation





Traditional methods for traffic signal control at an urban intersection are not effective in controlling traffic flow for dynamic traffic demand which leads to negative environmental, psychological and financial impacts for all parties involved. Urban traffic management is a complex problem with multiple factors  effecting the control of traffic flow. With recent advancements in machine learning (ML), especially reinforcement learning (RL), there is potential to solve this problem. The idea is to allow an agent to learn optimal behaviour to maximise specific metrics through trial and error. In this paper we apply two RL algorithms, one policy-based, the other value-based, to solve this problem in simulation. For the simulation, we use an open-source traffic simulator, Simulation of Urban MObility (SUMO), packaged as an OpenAI Gym environment. We trained the agents on different traffic patterns on a simulated intersection. We compare the performance of the resultant policies to traditional approaches such as the Webster and vehicle actuated (VA) methods. We also examine and  contrast the policies learned by the RL agents and evaluate how well they generalise to different traffic patterns.


Baher Abdulhai, Rob Pringle, and Grigoris J Karakoulas. Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering, 129(3):278–285,2003.

Richard E Allsop. Delay-minimizing settings for fixed-time traffic signals at a single road junction. IMA Journal of Applied Mathematics, 8(2):164–185, 1971.

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.

Alvaro Cabrejas Egea and Colm Connaughton. Assessment of reward functions in reinforcement learning for multi-modal urban traffic control under real-world limitations. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2095–2102. IEEE, 2021.

Mengyu Guo, Pin Wang, Ching-Yao Chan, and Sid Askary. A reinforcement learning approach for intelligent traffic signal control at urban intersections. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 4242–4247. IEEE, 2019.

PB Hunt, DI Robertson, RD Bretherton, and RI Winton. Scoot-a traffic responsive method of coordinating signals. Technical report, 1981.

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.

Li Li, Yisheng Lv, and Fei-Yue Wang. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica, 3(3):247–254, 2016.

Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-Pang Flötteröod, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wagner, and Evamarie Wießner. Microscopic traffic simulation using sumo. In The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.

PR Lowrie. Scats, Sydney co-ordinated adaptive traffic system: A traffic responsive method of controlling urban traffic. 1990.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533,2015.

Sajad Mousavi, Michael Schukat, Peter Corcoran, and Enda Howley. Traffic light control using deep policy-gradient and value-function based reinforcement learning. IET Intelligent Transport Systems, 11, 04 2017.

Robert Oertel and Peter Wagner. Delay-time actuated traffic signal control for an isolated intersection. In Proceedings 90th Annual Meeting Transportation Research Board (TRB), 2011.

Syed Shah Sultan Mohiuddin Qadri, Mahmut Ali Gökçe, and Erdinç Öner. State-of-art review of traffic signal control methods: challenges and opportunities. European Transport Research Review, 12(1):1–23, 2020.

Sampson, van As, Joubert, Dazeley, Labuschagne, and Swanepoel. South African road traffic signs manual. Civil Engineering= Siviele Ingenieurswese, 3(3):101, 2012.

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.

Li Song, Wei Fan, and Pengfei Liu. Exploring the effects of connected and automated vehicles at fixed and actuated signalized intersections with different market penetration rates. Transportation Planning and Technology, 44(6):577–593, 2021.

Aleksandar Stevanovic, Cameron Kergaye, and Peter Martin. Scoot and scats: A closer look into their operations. 01 2009.

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. 23 Traffic Flow Optimisation Louw, Labuschange and Woodley

Tomer Toledo, Tamir Balasha, and Mahmud Keblawi. Optimization of actuated traffic signal plans using a mesoscopic traffic simulation. Journal of Transportation Engineering, Part A: Systems, 146(6):04020041, 2020.

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3):279–292, 1992.

F.V. Webster. Traffic signal settings. Technical report, 1958.

Hua Wei, Guanjie Zheng, Vikash Gayah, and Zhenhui Li. Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation. ACM SIGKDD Explorations Newsletter, 22(2):12–18, 2021.

Hua Wei, Guanjie Zheng, Huaxiu Yao, and Zhenhui Li. Intellilight: A reinforcement learning approach for intelligent traffic light control. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2496–2505, 2018.




How to Cite

Louw, C., Labuschagne, L. ., & Woodley, T. . (2022). A Comparison of Reinforcement Learning Agents Applied to Traffic Signal Optimisation. SUMO Conference Proceedings, 3, 15–43. https://doi.org/10.52825/scp.v3i.116



Conference papers