Periodic Reporting for period 1 - DRVis (Dynamical Recurrent Visual Perceiver)
Période du rapport: 2022-07-01 au 2023-12-31
1. Agriculture: In this area computer vision based solutions are used for mapping insect pests and plant disease symptoms, for optimal spraying strategy and for recognizing pollinator bees. In all these tasks, which involve identification and classification of tiny objects (insects), leveraging sensor resolution will lead to a dramatic saving in time, pollution and costs.
2. Aids for the blind and visually impaired: The obvious incentive for miniaturization of cameras is countered by a sensor’s quality: cameras cannot be shrunk significantly without sacrificing resolution. Crucially, current technology does not allow to have both the desired performance and the desired small camera size.
3. Unmanned Aerial Vehicles (UAVs): Here the trends of miniaturization (“micro and nano-drones”) and autonomous-piloting have been dominating the market over the last decade. This necessitates low-resolution, embedded cameras in challenging scenarios of GPS-denied navigation, aerial conflict detection and landmark guided landing. Specifically, miniaturization of UAV to such “micro” and “nano” dimensions requires a reduction of an order of magnitude in camera resolution, a reduction that induces significant difficulties in training the autonomous agents.
To conclude, there is a multi-billion market of computer vision systems that are currently constrained by sensor optical performance and resolution.
The aims of this project were:
1. To advance the major computer vision tasks required for the applications described above, including segmentation, classification and identification, with low-resolution cameras.
2. To build a prototype of an edge event-based module for active visual perception based on results with standard frames and simulated motion.
In a recent computational work we used recurrent neural networks to illustrate how sensor motion along with recurrent computations dramatically improve the ability to perceive small images.Indeed we demonstr ated that this dynamical recurrent classifier (DRC) is capable of nearly fully recovering the recognition capability that was impaired by decreased sensor’s resolution. The idea is to use a series of low-resolution frames acquired by a moving camera, rather than using a single high-resolution image. We term this algorithm, whether implemented in software or hardware, DRVis. Our solution is applicable to a wide spectrum of image processing tasks in settings where sensor quality is low but multiple time samples are available.
Our solution will save costs associated with high-resolution cameras as well as computation, power and time required for high-resolution mappings. Furthermore, importantly, it will enable miniaturization to scales otherwise unachievable due to camera constraints. In relation to the examples listed above, our solution will enable low-cost and time efficient agricultural solutions, ranging from surveillance for insect pests to smart spraying. It will enable efficient and low cost reading aids for blind or partially sighted individuals. Furthermore, DRVis-based systems will enable the required miniaturization of an order of magnitude in camera resolution described above while preserving the performance of the autonomous UAV.
We believe that the dynamical aspect is central for pushing the boundaries of computer vision and will drive business inventiveness and innovation well beyond this specific PoC.
The bio-mimetic active vision event-based robotic platform (SYCLOP) was modified and tested in order to compare few configurations with different motors. The platform was used to acquire visual event-based datasets of tiny images using popular visual classification datasets (MNIST and FASHION-MNIST). The datasets were used to train DNNs which were modified for event-based data processing and included advanced network modules such as Transformers. Manipulations of the event-based data by adding temporal noise and retraining of the tested DNNs leads to reduced performance of these networks in the classification task, this result suggests that the trained networks learnt to utilize the additional spatiotemporal information embedded within the event-based visual data for tiny image recognition.
In terms of the commercialization plans of our technology, leading figures from the industry with experience in relevant fields were invited to an event that we organized where we presented our developed technology. The discussion following our presentation resulted with several potential commercial applications based on our technology.
The commercialization track resulted with several potential collaborations with companies from different fields, where our first aim is to adjust, test and prove the advantage of our technology on ‘real-life’ problems these companies face using their datasets.
The system was tested on public IVF dataset containing time-lapse images of developing embryos and we initiated a collaboration with a hospital in Israel where they are expected to supply us private IVF embryos imagery to further test and validate the advantages of the DRC.
We discussed the establishment of a spin-off company with Weizmann’s relevant offices (BINA & YEDA) and filed our first patent.
In summary, during the project we advanced both with developing our technology taking another biomimetic step- using event-based sensor, and validating the technology with real-life industrially-relevant datasets and tasks. Further advancement will be made by a follow-up research in the lab to scale-up the technology, as well as more collaborations with industry and development of commercial assets (e.g. patents and datasets) that will allow the foundation of a spin-off company.