Skip to main content

Going Deep and Blind with Internal Statistics

Periodic Reporting for period 1 - DeepInternal (Going Deep and Blind with Internal Statistics)

Reporting period: 2018-05-01 to 2019-10-31

In the past few years, since the revival of Deep Neural-Networks (DNNs), there has been an unprecedented progress and breakthrough results in Computer Vision, both in high-level and low-level vision tasks. Nevertheless, most of this impressive performance stems from the ability to train DNNs on huge amounts of training data (often tediously hand-labelled). This restricts the applicability of current Deep-Learning methods to specif
c problems and domains where enough training data exist. This limitation renders them inapplicable to domain areas where very little or no training data are available, or where data labeling (manual or automatic) is ill-defined.

Moreover, most of the success of Deep-Learning (DL) in Computer-Vision thus far was primarily exhibited on image data, whereas progress in video analysis is dramatically lagging behind. This is because the data complexity of video is orders of magnitude higher than that of images (due to the combinatorial complexity of SpatialAppearanceXTemporalDynamics). Moreover, labeling video data is much more difficult than images, and is often ill-defined, due to the continuous nature of video dynamics (as opposed to the discrete nature of objects in images). This strong reliance on labeled training data has so far hampered the progress of Deep-Learning in the area of video analysis.

In this project I show how very training data can be used to train DNNs; often no training examples whatsoever. I show that DNNs can be trained on examples extracted directly from the single available test image. I strive to combine the power of unsupervised Internal Data Recurrence with the sophistication and inference-power of Deep Learning, to obtain the best of both worlds. I anticipate this self-supervised learning approach will have high impact on the scientific community (both Computer Vision and Deep learning), as well as far reaching applications for the society. Some of these are described next.
During the first period of the project, we have developed new approaches and theories for Self-supervised Deep Learning, by exploiting the internal redundancy inside a single natural image. We coined it “Deep Internal Learning. The strong recurrence of information inside a single natural image provides powerful internal examples which suffice for training Deep Networks, without any prior examples or training data. This new “Deep Internal Learning” paradigm gives rise to true “Zero-Shot Learning”. We have so far demonstrated the power of this approach to a range of problems, including image-segmentation, transparent layer separation and blind image-dehazing (CVPR’2019), image-retargeting (ICCV’2019), blind super-resolution (NeurIPS’2019), and more. We have also shown how such self-supervision can be used for reconstructing images from brain recordings (fMRI) with very few external training data (NeurIPS’2019).

During the reporting period the team has published 5 research papers summarizing different projects covering and extending different aspects of the project goals. We making very good progress compared to the original plan. The papers were all peered reviewed and published in leading international journals and/or conferences including: Nature Communications, Annual Conference on Neural Information Processing Systems (NeurIPS), International Conference on Computer Vision and Pattern Recognition (CVPR), International Conference on Computer Vision (ICCV). We further have new upcoming papers currently under submission/review (not published yet), which contain additional new exciting breakthroughs. We have made the code and data of the published papers available (where applicable).
We expect to extend these theories and applications to new types of signals, domains, and capabilities. In particular, we plan to:

• Extend the notion of Deep Internal Learning to video data. This will provide a platform for substantial progress in video analysis (which thus far lagged behind due to the strong reliance on large amounts of supervised training data).
• Extend the notion of Deep Internal Learning to higher level Vision tasks.
• Extend the fMRI reconstruction work to exploit information from multiple brains.
• Provide a continuum between Internal and External training. This in turn will provide a platform to explore theoretical and practical takeoffs between the amount of available training data and the optimal proportion between Internal-vs-External training.
• Support Transfer-Learning and adaptation of existing powerful architectures to new domains where there is only little or no training data.