Quality real-time video-based traffic monitoring thanks to AI

The VISIONS project brings smart cities a step closer with a real-time traffic monitoring system benefiting from artificial intelligence-enabled high-quality video processing and streaming.

Transport and Mobility

Growing urban populations, coupled with increasing vehicle ownership, have prompted the development and installation of traffic monitoring systems to counter congestion and ensure road safety. While roads are increasingly covered by cameras, currently the bandwidth of most communications networks around the world is too limited to transmit high-quality traffic monitoring video, with poorer quality compromising the decision-making of traffic operators. The VISIONS project, funded by the Marie Skłodowska-Curie Actions(opens in new window) (MSCA) programme, has applied machine learning (ML) methods to video processing and streaming, to offer quality real-time video-based traffic monitoring. In the future, the VISIONS algorithm will be available as a software package, which can be downloaded to operational cameras or integrated into new cameras, helping support the EU’s ambition of reducing road deaths to zero by 2050(opens in new window).

End-to-end video optimisation

To maximise the network bandwidth available to the traffic monitoring system, the VISIONS project explored ML for both video processing and streaming. For video processing, cameras upload video with lower resolution, which the VISIONS algorithm then improves by reconstructing it using techniques such as super resolution. Regarding video streaming, VISIONS uses deep reinforcement learning (DRL) to adjust the video bit rate in real time, allowing the system to accommodate unexpected network dynamics (such as competing demands from other services) and improve users’ experience. “Given the limited computation capacity and energy consumption of traffic monitoring cameras, our neural network model can run reliably on cameras with limited computing resources,” notes MSCA fellow Xu Zhang. In the future, to ensure that users get as high-quality viewing as possible, while maximising bandwidth, the system will calculate a sort of trade-off, as Zhang explains: “If the end user facilities are powerful, VISIONS will transmit low-resolution videos in the network, reconstructing the video to increase its quality on the client side, so using less network bandwidth. If the end user facilities have fewer computational resources, higher-resolution videos have to be transmitted, consuming much higher bandwidth.”

Simultaneous algorithm testing

The system was developed using TensorFlow’s Python(opens in new window) API. A simulation environment was developed based on the video ingestion process of actual traffic video streaming services. The team trained several models simultaneously, each informed by different network and video data from public data sets, making the system more robust overall. This included: broadband upload data from the Federal Communications Commission(opens in new window) (FCC), 4G wireless bandwidth data collected on mobile devices in Ghent and the 3G HSDPA-bandwidth logs from mobile HTTP steaming scenarios. To evaluate performance, the VISIONS algorithm was compared with other state-of-the-art approaches, in terms of bandwidth consumption and video smoothness, alongside lost frames and image freezing, amongst other criteria. “Our algorithm can reduce lost frames and image freezing by 24 % and 15.5 %, respectively, without needing more bandwidth,” says Zhang.

Relevance for other multimedia systems

VISIONS focused on streaming video to control centres, helping operators remotely observe traffic flow to identify and respond quickly to problems such as emergencies or congestion. “In the future, we will explore the transmission of traffic video to AI systems for them to analyse and flag concerns. In the meantime, our results could benefit other systems relying on multimedia applications such as virtual reality applications, distance education, and healthcare,” concludes Zhang.