Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Information Theory beyond Communications: Distributed Representations and Deep Learning

Periodic Reporting for period 2 - STRUDEL (Information Theory beyond Communications: Distributed Representations and Deep Learning)

Berichtszeitraum: 2019-09-01 bis 2020-08-31

Artificial Intelligence (AI) and deep learning components are present in many of today’s autonomous and intelligent systems and can inevitably affect the safety, assurance, privacy, and performance of these systems interacting with uncertain and dynamic environments. For instance, autonomous cars use deep neural networks to classify and detect obstacles or pedestrians on a road; AI techniques are used in healthcare for diagnosis and in developing algorithms for medical devices; and domestic robots and assistive devices leverage AI algorithms to safely interact with humans. To provide any correctness guarantees for such systems, we first need to understand and formalize the desired, unexpected, or malicious behaviors that could be produced by these systems. These properties may specify the functionality of the inner AI components (e.g. intermediate representations of a multilayer neural network) by defining their input-output behavior. Alternatively, the properties may be at the level of the overall system that encompasses multiple AI components interacting with one another and with other decision-making components. Although some explanations appear to be solidly grounded, there is little mathematical understanding of representation learning. This project capitalizes on powerful and fertile concepts from information theory and information measures in order to advance the state of the art in deep learning. The overall project goal is to develop novel information-theoretic tools and understanding of deep learning based on information measures. The proposed framework is expected to bridge the gap between theory and practice to facilitate a more thorough understanding and hence improved design of deep learning architectures.
An overview of the preliminary results of the project is provided below according to the work planed during the first period:
▪ We identified potential application scenarios where information-theoretic tools, and in particular information measures, can let to potential applications of deep learning to engineering areas. The identified applications areas include--but are not limited to--statistical data anonymization, Smart Grids, healthcare for diagnosis based on Magnetic Resonance Imaging (MRI), and communication networks, among others.
▪ A central part of the work focused on the estimation of information measures of continuous distributions. The estimation of information measures based on samples is a fundamental problem in statistics and machine learning. In our work, we analyze estimates of Shannon information measures in high-dimensional Euclidean spaces, computed from a finite number of samples. In particular, we shown to be infeasible if the corresponding information measure is unbounded, clearly showing the necessity of additional assumptions on the underlying distributions. Subsequently, we derived sufficient conditions that enable confidence bounds for the estimation of differential entropy.
▪ We devised fully parameterized, differentiable estimators of the mutual information based on variational bounds. The flexibility of the proposed approach also allows us to construct estimators for mutual information between either discrete or continuous variables. We further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including disentangled representations, domain adaptation, fair classification, demonstrate the effectiveness of information measures to train deep neural networks.
▪ On the other hand, from a theoretical perspective, our work focused on investigating fundamental connections between generalization beyond the training distribution and the information that is propagated across the layers in the network. We elucidated through information-theoretic tools a mathematical characterization of the sets of feasible (and unfeasible) tradeoffs between complexity and relevance for distributed and collaborative Information bottleneck problems and for binary detection frameworks.
We expect the following results until the end of the project:
▪ How to learn more with less data? We are expected to elucidate novel methods for few-shot learning. These will maximize the mutual information between the query features and their label predictions for a given few-shot task.
▪ How to estimate differential entropy? We will construct an information estimator for the differential entropy and showcased several applications. This new estimator should be of general purpose and should not require any special properties of the learning problem. It should be possible to be incorporated as part of any training objective where differential entropy is desired.
▪ How to enhance the robustness of neural networks against adversarial attacks? We will investigate new robustness regularizer-based methods using concepts from information geometry. The main innovation will be to encourage invariant soft-predictions for both natural and adversarial examples while maintaining high performances on natural samples.
▪ How to detect misclassification errors? We will investigate simple and effective methods to detect whether a prediction of a classifier should or should not be trusted.

The wider societal implications of the project so far can be summarized as follows:
▪ Human beings have always understood the concept of privacy and used it to reveal, or not, various aspects of themselves. Privacy is about choice, the choice to reveal or not to reveal, details about yourself and your life. Everyday, we find ourselves in situations where we disclose to individuals and organizations, various pieces of information about who we are, what we do, and how we do it. As we head into an era where we are intrinsically connected, privacy will become an even more disparate and complex landscape. We need to go forward into this new era with a deeper understanding of how to make sure privacy rights are maintained and respected. Privacy-preserving AI methods offer an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants’ models.
▪ Recent advances in AI and neural networks have created much hope about the possibilities that AI presents to various domains that were previously thought to be off the limits for computer software. But there’s also concern about new threats AI will pose to different fields, especially where bad decisions can have very destructive results and thus serious consequences for human beings. A lot of real-world applications require AI models to be trustworthy, which means we must be able to understand how an AI model develops its behaviour and how it makes decisions. We also must have tools to evaluate how reliable AI is in various situations and to allow our models to learn to recognize novel objects or (non-stationary) changes in the environments.

Our works about the detection of distribution shift, classification errors and robustness to adversarial attacks will offer an attractive and novel approach which will be further developed during the next year, including to address relevant problems for the industry (e.g. Smart Grids, healthcare, autonomous driving devices).
Image summarizing the main tools used in this project
Mein Booklet 0 0