Skip to main content

Joint Inference with the Universal Schema

Final Report Summary - JOINUS (Joint Inference with the Universal Schema)

The last few years has seen the rise of deep learning as the dominant approach to solutions in Machine Learning in general, and Natural Language Processing in particular. Deep Learning and its ability to learn complex behaviour from supervised data has led to breakthrough performance in several tasks. Two of its core (perceived) disadvantages are the lack of interpretability, and the difficulty of incorporating prior knowledge. Consequently, a growing number of research efforts investigate how these disadvantages can be overcome. This project is one of them.

On the other side of the AI and machine learning spectrum is logic, and general symbolic approaches. In the early years of AI, this approach was likely as dominant as deep learning is today. Symbolic approaches, such as Prolog, make the injection of prior knowledge easy. Users can simply state their knowledge through logical statements. And they can follow the “reasoning” of the machine by seeing how the machine, mechanically, proves new statements based on given ones. However, much of human reasoning can not be easily framed this way (e.g. reasoning under noise, and uncertainty), explicitly and comprehensively framing intelligence in logical formalisms is now assumed to be impossible.

This project made substantial progress in terms of integrating both approaches, keeping their advantages and reducing their disadvantages. In particular, it introduced two ways bridge both worlds in unique ways. In the first way, neural networks are considered black boxes. To incorporate background knowledge, they are given an adversary agent that produces training data for it. This training data consists of examples on which the neural network makes mistakes that violate the background knowledge. The network gets a signal that these examples are of the “wrong class”, and learns to avoid the mistake. In the next iteration, the adversary agent creates another example that the network fails on, and so on. For example, the background knowledge may required that if geographic area A is contained in area B, and B contained in C, then A should be contained in C. The network has to predict what areas are contained in what other areas. The adversary finds triples A, B and C for which the network doesn’t follow this rule, and then gets the network to fix it.

The above approach is generic and can be applied to a large class of neural networks. It scales extremely well, in that its runtime does not depend on the number of entities the model concerns (such as the number of areas concerned in the example above). We believe it can lead to neural networks that humans can explain concepts to (such as the transitivity of the “contains” relation). Policymakers, businesses and civilians can make use of this technology to better control our next generation of AI agents.

The second way of bridging the gap between neural and symbolic approaches we developed relied more directly on work in symbolic reasoning. We used a reasoning algorithm known from prolog—backward chaining—as the basis of a novel neural network: the neural theorem prover (NTP_. This network functions like backward chaining in that it converts logical queries into sub-queries. For example, to figure whether "A contains C”, it checks whether there is an area B, and whether A contains B and B contains C. However, in contrast to standard backward chaining, the NTP can be trained to learn new rules from data. Work on learning rules from data has been done before (for example, in the context of Inductive Logic Programming), but in our case, learning can be integrated into deeper neural networks that handle the complete data processing pipeline. You could imagine a end-to-end neural self-driving care that could train a neural theorem prover in its pipeline, or a reasoning system that uses modern neural networks for processing natural language and feeds this data to our prover.

We tested the neural theorem prover on a range of benchmark problems. They outperformed previous purely neural approaches. In addition, and in contrast to the first line of work we discussed above, they also provide explanations and proofs for decisions. We believe that this is at least equally important as controlling AI for the stakeholders we mentioned above.

In addition to the research stated above, this Marie Curie action has also helped us to develop two machine learning libraries. The first, wolfe, was designed to enable researchers to quickly compose novel machine learning models, focussing on the problem and data, and getting inference and learning algorithms for free. This library can be found at It was the basis of several publications as well as a tutorial we gave at the ACL conference. It also served as the main tool to investigate the combination of so called matrix and tensor factorization methods for knowledge base population in the first half of this project. We also developed a novel machine reading framework ( that we will release soon. This framework makes it easy to test and build question answering models, natural language inference systems and knowledge base population approaches.