Skip to main content

Artificial Intelligence for Large-Scale Computer-Assisted Reasoning

Periodic Reporting for period 4 - AI4REASON (Artificial Intelligence for Large-Scale Computer-Assisted Reasoning)

Reporting period: 2020-03-01 to 2020-10-31

The AI4REASON project is targeting a very hard problem in AI and automation of reasoning, namely the problem of automatically proving theorems in large and complex theories.

Such complex formal theories arise in projects aimed at verification of today's advanced mathematics such as the Formal Proof of the Kepler Conjecture (Flyspeck), verification of software and hardware designs such as the seL4 operating system kernel, and verification of other advanced systems and technologies of today's information society.

It seems extremely complex and unlikely to design an explicitly programmed solution to the problem. However, we have recently shown that the performance of existing approaches can be multiplied by data-driven AI methods that learn reasoning guidance from large proof corpora. The AI4REASON project focuses on developing such novel AI methods.

The project has indeed resulted in developing such breakthrough methods, increasing the performance of theorem proving over large theories by 40-70% when compared to the pre-project status. It has also opened a number of new research directions in AI and reasoning, and helped to create a larger research community around these topics.
The research was done in the following five areas and work packages:

WP1: High-Level Premise Selection
WP2: Internal Proof Guidance
WP3: Lemmatization, Conjecturing, and Concept Introduction
WP4: Self-Improving AI Systems Combining Deduction and Learning
WP5: Deployment and Cross-Corpora Reuse

The project has achieved several breakthroughs in these topics, leading to large improvements in theorem proving performance over the state of the art. Combining learning and reasoning seems to be a very viable approach to building stronger AI and reasoning systems.

In WP1, we have worked on novel learning architectures for premise selection, applying deep neural networks to this task jointly with a new team at Google. We have also explored various boosting methods between the learning and proving systems. This has led to improvements in premise selection and consequently stronger theorem proving in large theories. The strongest developed method uses fast graph neural networks and improves over the pre-project results by about 25%.

In WP2, we have largely improved internal guidance of automated theorem provers (ATP) by machine learning and reinforcement learning methods. We implemented selection mechanisms based on gradient boosting decision trees, developed representation of clauses by feature vectors and fast feature hashing, and developed guidance based on neural networks. In a large evaluation, our methods improved the best open system by 70%, going from 15 thousand proved theorems to 25 thousand in the same time. This is an unusually high improvement and a clear breakthrough in theorem proving. A new kind of learning-guided tactical theorem proving approach was developed for HOL4 and Coq systems - the TacticToe and Tactician systems, practically helping formalization in these major systems.

In WP3, we have developed several lemmatization and conjecturing methods, using statistical-symbolic analogies, neural networks, and deep reinforcement learning.

In WP4, feedback loops between learning-based internal guidance of ATPs and learning have been developed in several ways, as well as strategy invention loops. We have introduced reinforcement learning for theorem proving and obtained several strong results. Our MaLARea AI system has twice dominated the world championship in theorem proving in its large-theory division.

In WP5, we have developed first autoformalization methods by combining neural and probabilistic context-free parsing with symbolic methods such as formal type-checking and theorem proving. This research opens the way to large-scale automated processing (e.g. searching, checking and reviewing) of the large body of human written mathematics and related sciences.

Several of our ATP systems have won several divisions of the yearly competition of automated theorem provers (CASC) in 2016-2020.

We have organized several conferences and workshops in the field and gave a number of invited talks, growing a larger research community around these topics.
Our project and domain is unique in connecting two major AI fields: Automated Reasoning and Machine Learning. This produces new methods in Automated Reasoning, as well as new tasks and issues in Machine Learning. Particularly interesting and important are combinations of learning and reasoning methods into larger AI metasystems where the learning and reasoning components inform and improve each other's work in various feedback loops.

In more detail, the major novel aspects and progress beyond state of the art include:

- equipping a number of theorem provers with a guiding component based on learning from previous proofs,
- application of deep learning and other advanced learning methods to theorem proving,
- defining the autoformalization task and building the first corpora and systems for autoformalization, and
- building several AI metasystems that combine learning and reasoning in various feedback loops.

These methods have led to a significant improvement of the performance of automated reasoning and autoformalization tools on several standard benchmarks as well as to new results in automatically assisted research-level mathematics.
The AI4REASON team gained leading positions in the ATP System Competition in 2020.
The team of the AI4REASON won the ATP System Competition (CASC- LTB Division) in 2018.