Final Report Summary - IARTIST (industry-Academia Research on Three-dimensional Image Sensing for Transportation)
a) Hardware components
A new pixel architecture was designed for wide dynamic range imaging. CMOS pixels have typically linear response with limited dynamic range of 2-3 decades. This is disastrous for transportation applications as there is often bright light from vehicles as well as sun-glare. A new technique for achieving high dynamic range using matrix filling and compressive sensing was developed. Even when these images are captured, it is difficult to reproduce them on a screen. This requires a complex operation, known as tone mapping, to be applied on each pixel’s response. We have developed a new pixel technology, which can produce real time tone mapped images at the focal plane itself.
At the camera level, monocular cameras are the mainstay of intelligent transportation systems. However, they do not provide depth information, which limits their ability to classify objects and perform measurements of sizes and distances. A new imaging system has been developed, by using a binocular system with off-the-shelf components, which has shown promising results. We have identified a new error limit for camera calibration using multi-sensory approach. We designed various stages of ITS backbone including Software to identify logo and colours of vehicles in traffic scenes. This has been verified using typical traffic scenes. In addition, we have developed two sampling systems to reduce the data produced by cameras.
b) Algorithms
The iARTIST project led to a host of software suites with algorithms for detecting and describing various kinds of traffic violations. Among these, special attention was given to complex scenarios, involving detection and recognition of pedestrians, including vulnerable road-users, and vehicles. Convolutional neural networks based algorithms were extensively explored. Utilising research on convoluted neural networks for feature extraction, the partnership developed a new system to extract vehicles and people on road. A single CNN was trained for pedestrian and vehicle detection. Both vehicles and pedestrians are detected in a single run of the detector. Classifying pedestrians into two classes based on their height allowed to distinguish between adults and children. The CNN detector performs well for both object classes. With proper camera calibration from the known dimensions of the scene, the system is able to measure height of the pedestrians and decide whether the pedestrian is a child or not. Detection of complex events such as failure to yield right of way involves detection of several features and their mutual relations, such as the traffic light colour, moving object type, the position of the stop line at intersection and the detection of movement. Additionally, a subsystem comprising of two CNNs, one for recognizing make and model and other for matching colour of a vehicle using two different CNNs has been built in order to verify the identity of vehicles whose number plates are recognized by industrial partner's legacy systems.
Research was also carried out on vehicle re-identification without using personal - and therefore sensitive - data, i.e. by using just the low-resolution visual information. We proposed LFTD (Learning Features in Temporal Domain) method for aggregation of features in temporal domain. In targeted application domains, it is common to have multiple observations of the same object of interest. The proposed aggregation method is based on weighting different elements of the feature vectors by different weights and it is trained in an end-to-end manner by a Siamese network. The experimental results show that the new method outperforms other existing methods for feature aggregation in temporal domain on both vehicle and person re-identification tasks. Furthermore, to push the research in vehicle re-identification further, we collected and introduced a novel dataset CarsReId30k. The dataset is not limited to frontal/rear viewpoints. It contains 8,343 unique vehicles, 29,676 observed tracks, and 92,846 positive pairs. The dataset was captured by 24 cameras from various angles. The dataset will be made public along with publication of an article describing the research activities carried out.
Finally, transportation cameras suffer in low light conditions with very poor colour reproduction. A low light colour extraction software using retinex algorithm was developed to extract meaningful colour in dark images from traffic cameras.
c) Knowledge Transfer
Last, but foremost, this project led to significant knowledge transfer between academic and industrial partners. It provided training for scientists at all three stages of their careers, early-stage, intermediate and experienced to learn and transfer knowledge in an intersectoral setting, something which would not have been possible otherwise.
The project has a website at www.eng.ox.ac.uk/iartist