Dynamical Recurrent Visual Perceiver

Informations projet

DRVis

N° de convention de subvention: 101069457

DOI

10.3030/101069457

Projet clôturé

Date de signature de la CE 6 Avril 2022

Date de début 1 Juillet 2022

Date de fin 31 Decembre 2023

Financé au titre de

European Research Council (ERC)

Coût total

Aucune donnée

Contribution de l’UE

€ 150 000,00

Coordonné par

WEIZMANN INSTITUTE OF SCIENCE
Israel

Periodic Reporting for period 1 - DRVis (Dynamical Recurrent Visual Perceiver)

Période du rapport: 2022-07-01 au 2023-12-31

Computer vision has become ubiquitous in virtually every domain of industry, trade and services. While tremendous progress has been achieved for a variety of tasks ranging from tumor detection to autonomous driving, any real-life application of these solutions requires a use of expensive, power-and volume-consuming vision sensors. The following are salient examples where such sensors operate at the limit of their optical resolution:
1. Agriculture: In this area computer vision based solutions are used for mapping insect pests and plant disease symptoms, for optimal spraying strategy and for recognizing pollinator bees. In all these tasks, which involve identification and classification of tiny objects (insects), leveraging sensor resolution will lead to a dramatic saving in time, pollution and costs.
2. Aids for the blind and visually impaired: The obvious incentive for miniaturization of cameras is countered by a sensor’s quality: cameras cannot be shrunk significantly without sacrificing resolution. Crucially, current technology does not allow to have both the desired performance and the desired small camera size.
3. Unmanned Aerial Vehicles (UAVs): Here the trends of miniaturization (“micro and nano-drones”) and autonomous-piloting have been dominating the market over the last decade. This necessitates low-resolution, embedded cameras in challenging scenarios of GPS-denied navigation, aerial conflict detection and landmark guided landing. Specifically, miniaturization of UAV to such “micro” and “nano” dimensions requires a reduction of an order of magnitude in camera resolution, a reduction that induces significant difficulties in training the autonomous agents.
To conclude, there is a multi-billion market of computer vision systems that are currently constrained by sensor optical performance and resolution.
The aims of this project were:
1. To advance the major computer vision tasks required for the applications described above, including segmentation, classification and identification, with low-resolution cameras.
2. To build a prototype of an edge event-based module for active visual perception based on results with standard frames and simulated motion.
In a recent computational work we used recurrent neural networks to illustrate how sensor motion along with recurrent computations dramatically improve the ability to perceive small images.Indeed we demonstr ated that this dynamical recurrent classifier (DRC) is capable of nearly fully recovering the recognition capability that was impaired by decreased sensor’s resolution. The idea is to use a series of low-resolution frames acquired by a moving camera, rather than using a single high-resolution image. We term this algorithm, whether implemented in software or hardware, DRVis. Our solution is applicable to a wide spectrum of image processing tasks in settings where sensor quality is low but multiple time samples are available.
Our solution will save costs associated with high-resolution cameras as well as computation, power and time required for high-resolution mappings. Furthermore, importantly, it will enable miniaturization to scales otherwise unachievable due to camera constraints. In relation to the examples listed above, our solution will enable low-cost and time efficient agricultural solutions, ranging from surveillance for insect pests to smart spraying. It will enable efficient and low cost reading aids for blind or partially sighted individuals. Furthermore, DRVis-based systems will enable the required miniaturization of an order of magnitude in camera resolution described above while preserving the performance of the autonomous UAV.
We believe that the dynamical aspect is central for pushing the boundaries of computer vision and will drive business inventiveness and innovation well beyond this specific PoC.

Original DRC work was extended in two main directions. In the first one, the algorithm was adjusted and trained with the gold-standard dataset for image classification- ImageNet, and we were able to show that our solution out-performs state-of-the-art (SOTA) Super-Resolution network which uses the same multi-frames input. The second direction included modification of the standard Object-Detection framework to allow the usage of DRC as the backbone of an Object-Detection network. We were able, using this framework, to train a DRC based Object-Detection network on real videos acquired by cameras positioned within moving cars (KITTI, KITTI360). The network converged but didn’t reach SOTA performance, probably, due to the usage of compressed videos as well as other factors such as insufficient FPS, which made the standard video datasets non-optimal for usage with the DRC.
The bio-mimetic active vision event-based robotic platform (SYCLOP) was modified and tested in order to compare few configurations with different motors. The platform was used to acquire visual event-based datasets of tiny images using popular visual classification datasets (MNIST and FASHION-MNIST). The datasets were used to train DNNs which were modified for event-based data processing and included advanced network modules such as Transformers. Manipulations of the event-based data by adding temporal noise and retraining of the tested DNNs leads to reduced performance of these networks in the classification task, this result suggests that the trained networks learnt to utilize the additional spatiotemporal information embedded within the event-based visual data for tiny image recognition.
In terms of the commercialization plans of our technology, leading figures from the industry with experience in relevant fields were invited to an event that we organized where we presented our developed technology. The discussion following our presentation resulted with several potential commercial applications based on our technology.

Comparing our event-based (EB) tiny-image recognition (tiny-EB-MNIST) with the equivalent frame-based recognition indicates that our EB frameworks are superior to the standard ones in this task. Specifically, our tinyMNIST datasets classification accuracy ranges from 89% to 94% where the standard frame-based input CNN on the same tiny images reaches 80%.
The commercialization track resulted with several potential collaborations with companies from different fields, where our first aim is to adjust, test and prove the advantage of our technology on ‘real-life’ problems these companies face using their datasets.
The system was tested on public IVF dataset containing time-lapse images of developing embryos and we initiated a collaboration with a hospital in Israel where they are expected to supply us private IVF embryos imagery to further test and validate the advantages of the DRC.
We discussed the establishment of a spin-off company with Weizmann’s relevant offices (BINA & YEDA) and filed our first patent.
In summary, during the project we advanced both with developing our technology taking another biomimetic step- using event-based sensor, and validating the technology with real-life industrially-relevant datasets and tasks. Further advancement will be made by a follow-up research in the lab to scale-up the technology, as well as more collaborations with industry and development of commercial assets (e.g. patents and datasets) that will allow the foundation of a spin-off company.

A photo of the setup that includes the event-based camera mounted on a small robotic device which

Periodic Reporting for period 1 - DRVis (Dynamical Recurrent Visual Perceiver)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page