Periodic Reporting for period 1 - ENVISION (Enabling Visual IoT Applications with Advanced Network Coding Algorithms) Reporting period: 2018-01-15 to 2020-01-14 Summary of the context and overall objectives of the project The latest advances and integration of several key technologies such as wireless communications, low-power sensing, Internet protocols and cloud computing, have enabled the emergence of the Internet of Things (IoT) paradigm. The ever-growing deployment of the visual sensing applications within IoT deployments already strains the network and cloud infrastructures used to deliver and store massive amounts of visual data. Along with conventional image and video content, we are witnessing a surge in 360-degree video traffic originating from AR/VR as well as interactive multiview video. At the same time, novel sensing paradigms emerge which depart from the conventional frame-based sensing. A prominent example is neuromorphic visual sensors (NVS). NVS devices record pixel coordinates and timestamps of reflectance events in an asynchronous manner, thereby offering substantial improvements in sampling speed and power consumption. In order to accommodate for the surge in the visual content and deal with emerging visual data types, appropriate transmission and storage mechanisms need to be developed. The ENVISION project aimed at developing such data-driven delivery and storage algorithms based on advanced coding techniques for data acquired by both conventional frame-based video cameras and NVS devices. Specifically, ENVISION has pursued the following interconnected research objectives: (i) design of advanced content-driven delivery mechanisms for the transmission of the visual content captured by both neuromorphic and conventional visual sensors to the cloud service, (ii) development of novel data-driven methods for storage of the visual content, and (iii) design of low-complexity encoding/decoding techniques for robust data representation. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far "The work performed during the ENVISION project aimed at developing a series of advanced methods and algorithms for efficient representation, storage and delivery of visual content pertaining to emerging visual application in the context of the IoT paradigm. We studied the problem of cache-aided delivery of interactive multiview video content to wireless users and developed a greedy joint caching and scheduling policy that takes into account the way users navigate through the scene and the video delivery constraints [J2, C4]. The performance of this greedy policy is within certain bounds with respect to the optimal solution. We also studied the problem of caching 360-degree video, which arises in the context of virtual reality and augmented reality applications [J3, C2]. To maximize the use of caching resources and eventually ensure the best possible QoE for the users, the proposed caching algorithm takes advantage of 360-degree video encoding into multiple tiles and quality layers to make fine-grained decisions regarding which content to cache in each base station and where to deliver the content from while exploiting the collaboration between the base stations. Next, we developed a joint source-channel coding (JSCC) algorithm for wireless transmission of images captured by conventional active pixel sensing cameras [J4, C3]. We proposed a radically novel approach based on the autoencoder framework and convolutional neural networks. We showed that through the use of deep models and the backpropagation algorithm, compact source representations can be learned which are also robust to variations in the channel quality. Finally, we investigated the problem of representation learning for neuromorphic vision sensors (NVS) data [J1, C1]. We developed a novel framework which comprises a compact graph representation for NVS data combined with a spatio-temporal feature learning module based on residual graph-convolutional neural networks. This framework allows for efficient end-to-end learning of features directly from NVS data. Overall, the work conducted during the ENVISION project resulted in 4 peer-reviewed papers presented at leading international conferences/workshops, and 2 journal peer-reviewed publications in high-impact factor IEEE Trans. Two more journal papers are under review. [J1] Yin Bi, A. Chadha, A. Abbas, E. Bourtsoulatze and Y. Andreopoulos, ""Graph-based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing,"" submitted, Nov. 2019[J2] E. Bourtsoulatze and D. Gunduz, ""Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks,"" submitted, Oct. 2019[J3] P. Maniotis, E. Bourtsoulatze, and N. Thomos, ""Tile-Based Joint Caching and Delivery of 360o Videos in Heterogeneous Networks,"" IEEE Trans. on Multimedia, Dec. 2019[J4] E. Bourtsoulatze, D. Burth Kurka, and D. Gunduz, ""Deep Joint Source-Channel Coding for Wireless Image Transmission,"" IEEE Trans. on Cognitive Comms. and Netw., Sept. 2019[C1] Y. Bi, A. Chadha, A. Abbas, E. Bourtsoulatze and Y. Andreopoulos, ""Graph-Based Object Classification for Neuromorphic Vision Sensing,"" in Proc. of IEEE ICCV, Oct. 2019 [C2] P. Maniotis, E. Bourtsoulatze and N. Thomos, ""Tile-Based Joint Caching and Delivery of 360o Videos in Heterogeneous Networks,"" in Proc. of IEEE MMSP, Sept. 2019[C3] E. Bourtsoulatze, D. Burth Kurka and D. Gunduz, ""Deep Joint Source-Channel Coding for Wireless Image Transmission,"" in Proc. of ICASSP, May 2019[C4] E. Bourtsoulatze and D. Gunduz, ""Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks,"" in Proc. of PIMRC, Sept. 2018" Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) The research conducted in the context of the ENVISION project has advanced the state-of-the-art in several important directions: Our first major contribution is a novel joint source and channel coding (JSCC) method for still images. The proposed JSCC algorithm goes beyond the state-of-the-art source and channel coding approaches, which build upon the separation principle. Instead, our approach exploits for the first time deep learning and the autoencoder framework to implement the encoding functions. It eliminates the “cliff effect”, a long-standing problem of digital communications. This is one of the seminal works in the field and has generated significant interest in the research community as evidenced by the number of follow up studies and citations. It is expected to further develop into a standalone research field and to become a major paradigm shift in the design of source and channel coding for visual content. Our second significant contribution is the development of caching and delivery policies for emerging multimedia applications including interactive multiview and 360-degree video streaming. Caching of popular content in the wireless edge is considered by mobile operators as a prominent solution to facilitate the delivery of massive multimedia content. We have proposed caching and delivery algorithms, which unlike state-of-the-art solutions, take into account not only bandwidth and caching constraints, but also constraints arising from application specific characteristics. Finally, we have made significant contributions towards the learning of feature representations for neuromorphic vision sensing, which so far has been far behind its active pixel sensing-based counterparts resulting in lower performance in high level vision tasks. Specifically, we have proposed a novel graph-based representation for neuromorphic events and a new graph-based spatio-temporal feature learning module, which allows to efficiently model coarse temporal dependencies over multiple graphs and enables fast end-to-end graph based training and inference.