Periodic Reporting for period 1 - VUAD (Video Understanding for Autonomous Driving)
Período documentado: 2020-04-01 hasta 2022-03-31
Self-driving cars are poised to become a trillion-pound market in the next few decades, based on the needs of commuters and logistics chains worldwide. More importantly, they would solve two pressing problems of our society. 1.25 million people die in car accidents each year due to human error, and about 35 million are severely injured, rivaling the worst diseases. Another often-ignored fact is that the average car commuter spends 52 minutes per day driving to or from work, amounting to 5.4% of their waking time lost to a menial task. Enabling an Artificial Intelligence (AI) system to understand and drive in complex urban environments now seems largely solved for most common scenarios. Companies such as Waymo (US), Tesla (US), and Wayve (UK) routinely test on public roads. This achievement was made possible, largely, by advances in deep neural networks.
Computer Vision methods achieve impressive results on a single image for various tasks such as object detection. For instance, pedestrian detectors now boast over 98% accuracy according to the widely-acknowledged KITTI benchmark. However, this success has not been fully extended to sequences yet. It is commonly acknowledged that video understanding falls years behind a single image. This is mainly due to two reasons: the processing power required for reasoning across multiple frames and the difficulty of obtaining ground truth for every frame in a sequence, especially for pixel-level tasks. Based on these observations, there are two likely directions to boost the performance of tasks related to video understanding: unsupervised learning and object-level reasoning. We work on both perspectives in this project. We present deep learning solutions for dynamic scene understanding by detecting and tracking multiple people in street scenes, i.e. multi-object tracking (MOT) as well as by modeling the movement of the static parts of the scene which arise from camera motion.
For dissemination, every year we attended at least two conferences in the field. During these conferences, we also participated in workshops and tutorials related to the project. In a graph learning workshop, we presented our work in the first objective. We presented the results of the third objective at a conference as well as another workshop. We also presented our work on the project in several invited talks both internationally and nationally addressing the scientific community as well as the university and the high school students. We were also an active part of the AI community at the host university by organizing weekly AI talks and teaching a graduate-level course on self-driving vehicles covering topics related to the project. We additionally performed high school visits, a training program that is open to university students from all over the country with the help of a non-profit organization. We believe that these activities helped foster the research and teaching environment at the host university, even at a national level to some extent.
In the big picture, self-driving will reduce the number of deaths due to traffic accidents. Most of the reasons behind the traffic accidents are related to the driver such as fatigue, substance usage, or medical conditions. With self-driving, we can create a safer traffic environment and reduce the number of deaths due to traffic accidents. Self-driving is also expected to increase shared vehicle systems, and reduce the negative impact of climate disaster by creating new ways of transportation. These changes will save time for everyone but will also provide mobility for the disabled and elderly. In terms of wider societal implications, it is expected to introduce economic gains and create new jobs.