Skip to main content

Event-Driven Compressive Vision for Multimodal Interaction with Mobile Devices

Periodic Reporting for period 3 - ECOMODE (Event-Driven Compressive Vision for Multimodal Interaction with Mobile Devices)

Reporting period: 2017-07-01 to 2018-12-31

The visually impaired and the elderly, often suffering from mild speech and/or motor disabilities, are experiencing a significant and increasing barrier in accessing ICT technology and services. Yet, in order to be able to participate in a modern, interconnected society there is need also for these user groups to have access to ICT, in particular to mobile platforms such as tablet computers or smart phones.Smart phones to most have become indispensable everyday devices, used by many almost 24 hours a day. However, most of these products are designed and marketed for the young, tech savvy and multi‐media affine. To the visually and motor impaired, handling of these mobile device can be overwhelming, confusing, and unnecessarily difficult. The proposed project aims at developing and exploiting the recently matured and quickly advancing biologically-inspired technology of event-driven, compressive sensing (EDC) of audio-visual information, to realize a new generation of low-powermulti-modal human-computer interface for mobile devices. We will demonstrate that the proposed technology step is particularly suited to support the user groups of the visually and motor impaired in their interaction with modern ICT devices, and hence improve their potential to better participate in a modern, interconnected society. Operation of the proposed devices will be more independent of the environment, particularly offering unrestricted use of the mobile device under uncontrolled lighting and background noise conditions such as present e.g. in inner‐city outdoor scenarios.
So far, even four years later after the project started, there is no new technique that allows for human-machine interfaces indoors and outdoors as efficiently as the one developed within the ECOMODE project. We have shown that a cloud-deprived set of algorithms operating only on the power budget of a mobile phone is feasible. The interface truly introduces several features that go beyond any other existing platform. We were able to develop a new EDC sensor, connect it to a mobile platform and develop a whole new range of visual processing that runs on that limited power budget. The interface provides reliable results and more importantly it introduced new features for both vision and speech. So far no one has shown that a camera can help improve speech recognition, nor an algorithm that can remove background and at the same recognize gestures on the power budget of a mobile phone. When the project started, EDC technology was available at the laboratory level. Some labs had custom USB interfaces to Windows or Linux/MAC computers. There was no possibility to interface to Android based systems. Therefore, exploitation of such technology on portable edge Android-based devices was not feasible. Now, ECOMODE has made available a highly compact and integrated plug-in module that interfaces directly through the USB connector of an Android device. This allowed the full development of Work Packages 3 to 8, resulting in useful handheld devices tested on the target users, and with commercial possibilities.During the final year, we also intensively worked on integrating all components of the WP into the mobile phone with our partner Experis. We consolidated the final databases from recordings made by Streetlab and FBK with real end users. We also improved the recognition module by fine tuning the parameters and by filtering camera noise more efficiently. An integrated system that brings together most of the works carried out during the project life within the different technical workpackages has been produced. The facilitator consists of 6 modules that cope with the main functionalities required by the end users.All these modules coexist inside the facilitator and remain isolated from each other thanks to the modularized architecture implemented within the Android application. This facilitates the maintenance, implementation of improvements and branch independent testing in order to ensure the expected behaviour and speed up bug resolutions. A mobile application that provides a set of native applications, also named as applets, that provide a redesigned visual interface offering the same functionalities in a more intuitive way for the end users has been developped. Thus, the multimodality of the ECOMODE project enables to move through these applications by combining gestures and speech commands, selected after analysing several aspects such as cognitive demand, which aims to improve the communication between mobile devices and people that is not familiarized with emerging technologies. These applets have been considered in common with the end users’ representatives.
The results of ECOMODE have been extensively disseminated through publications, and participation to international conferences. A full exploitation plan was included in D8.3 Exploitation Plan and Updates.
The consortium has made significant progress beyond the state of the art and expected results until the end of the project thus increasing further potential impact of ECOMODE:
-DVS cameras are becoming commercially attractive. Besides Chronocam (now called Prophesee), there are a few other companies commercializing variants of DVS cameras: Inivation (Zurich), Celex Ltd. (Singapore/China), and even Samsung has announced developments for DVS cameras. In ECOMODE, for the first time, this type of sensor has been used on a mobile platform running Android for Gesture recognition for vision impaired people. We expect that this could be one of many application domains of this new type of camera sensors. Having available such low-cost systems for vision impaired people has a high potential for societal impact. At the more technological level, the specific developments within WP2 open new ways for other application domains (autonomous cars, robots, surveillance, smart vision-triggered always-on IoT devices, etc).
The developments within WP2 have advanced with respect to the state-of-the-art in the following aspects:
-A compact USB plug-in DVS camera module is available for edge devices (tablets or mobile phones),
-A USB interface for Android systems is available,
-A DVS-MIPI interface module that can be synthesized for ASIC or FPGA is available
We have also increased knowledge of the field of event based computation in different aspects. We improved the machine learning algorithm based on time surfaces (HOTS). We added a suppression of redundant and overlapping event by centering events on main activity events. We also explored the space of parameters to define an optimal architecture that allows for a robust recognition of the database collected We introduced a background removal while the phone is in motion indoors and outdoors. This method is genuinely new as it does not necessitate the computation of optical flow, or the inertia measurement unit of the phone. We showed for the first time that the relative activity of events of a background and the foreground (where gesture happen) can be a reliable measure to separate a gesture from a dynamic background. We implemented and validated the approach and most of all managed to integrate of these into the mobile phone.
image.jpg