CORDIS - Forschungsergebnisse der EU
CORDIS

Fast Interactive Verification through Strong Higher-Order Automation

Periodic Reporting for period 4 - Matryoshka (Fast Interactive Verification through Strong Higher-Order Automation)

Berichtszeitraum: 2021-09-01 bis 2022-02-28

Proof assistants are increasingly used to verify hardware and software and to formalize mathematics. However, despite the success stories, they remain very laborious to use. The situation has improved with the integration of first-order automatic theorem provers—superposition provers and SMT (satisfiability modulo theories) solvers—through middleware such as Sledgehammer for Isabelle/HOL and HOLyHammer for HOL Light and HOL4; but this research has now reached the point of diminishing returns. Only so much can be done when viewing automatic provers as black boxes.

To make interactive verification more cost-effective, we proposed to deliver very high levels of automation to users of proof assistants by fusing and extending two lines of research: automatic and interactive theorem proving. This was our grand challenge. Our starting point was that first-order automatic provers are the best tools available for performing most of the logical work. Our approach was to enrich superposition and SMT with higher-order reasoning in a careful manner, to preserve their desirable properties. We designed proof rules and strategies, guided by representative benchmarks from interactive verification.

We developed higher-order SMT at the level of a prototype. We made further progress with higher-order superposition. Specifically, we developed two highly automatic provers building on modern superposition provers: E and Zipperposition. To reach end users, these new provers were integrated in the Isabelle/HOL proof assistant and are available as backends to more specialized verification tools. The users of proof assistants and similar tools stand to experience substantial productivity gains. For example, in a recent unpublished empirical evaluation, we found that higher-order reasoning increased the success rate of automatic provers by about three percentage points, in an area where every percentage point counts.
Superposition and SMT (satisfiability modulo theories) are currently the most successful proof calculi for reasoning about classical first-order logic. Despite some convergence, they remain very different technologies, with complementary strengths and weaknesses.

Our main contribution has been to extend superposition and SMT to handle higher-order constructs. These are constructs that are present in nearly all proof assistants (Coq, Isabelle, ...) and hence are needed to do actual reasoning in practice. In particular, higher-order logic supports functions as arguments, quantification over functions, and binders (e.g. the notations for summation and integration). For superposition and SMT, we now support full higher-order logic. These techniques are implemented in the SMT solver veriT and the superposition provers E and Zipperposition.

The new versions of the provers E and Zipperposition are now included with the Isabelle/HOL proof assistant, where they are used to discharge proof obligations automatically via a tool called Sledgehammer. This helps make hundreds of users more productive. Instead of having to spend perhaps 15 or 30 minutes on a manual proofs, users can simply press a button and wait up to 30 seconds for an automatically generated proof. The provers E and Zipperposition also won trophies at the annual theorem proving competition CASC. Zipperposition won in 2020 and 2021 in the higher-order theorem division. E won in the Sledgehammer division. This demonstrated to the entire theorem proving community the superiority of our approach.

We have also made some contributions to the formal mathematical library of the Lean proof assistant. Notably, we have developed a formal theory of p-adic numbers and a formal proof of the solution of the cap set problem. Some of these contributions are now part of the Lean mathlib library, which is used and developed largely by mathematicians.

On the theoretical side, our main contribution has been the development of the lambda-superposition proof calculus, which forms the basis of the new E and Zipperposition. Lambda-superposition is both sound and complete: It can be used to prove a problem if and only if that problem is provable. Soundness (the "only if" direction if the previous sentence) is relatively easy to establish, but completeness (the "if" direction) is quite a challenge. Our solution has been to identify three intermediate milestones, corresponding to fragments of higher-order logic, and for each milestone to design a calculus and prove it sound and complete. Moreover, we structured each of the proof in a layered fashion, with three modular layers, to keep the complexity under control.

Before the pandemic, we organized two project workshops, Matryoshka 2018 and Matryoshka 2019, where we also invited developers of competing higher-order approaches. We also organized the WAIT (Workshop on Automated (Co)induction Theorem Proving) 2018 workshop the same week as Matryoshka 2018 to foster collaborations, and for Matryoshka 2019 we merged it with the VeriDis 2019 retreat, comprising the theorem proving groups of Stephan Merz at Inria Nancy and Christoph Weidenbach at MPI Saarbrücken.
We now have a new generation of provers (namely, E, veriT, and Zipperposition) that support a richer logic, and hence are more easily and efficiently applicable to be used in proof assistants.

The provers E and Zipperposition are now the state of the art in terms of performance on higher-order problems. Higher-order reasoning has long been seen as a "holy grail" in the automated deduction community. Our work has shown that it can be tacked efficiently, just like first-order reasoning, and that first-order techniques can be adapted, whether it be the underlying proof calculus, the completeness proof, the data structures and algorithms, or the prover themselves.

Similarly, the lambda-superposition calculus that underlies E and Zipperposition is now the state-of-the-art proof calculus for higher-order reasoning, ahead of Vampire's combinatory superposition, Satallax's SAT-based tableaux, and Leo-III's higher-order paramodulation on empirical evaluations. One reason for lambda-superposition's strength is that it gracefully generalizes one of the most successful first-order calculi, standard superposition. This concept of graceful generalization has been key: Starting from a position of strength, we considered only extensions that help solving new formulas without weakening the calculus or prover's strength on existing successful formulas.

On the SMT front, we have obtained promising results with the veriT prover but more research will be necessary, especially in the area of quantifier instantiation, before this approach can compete with provers like E and Zipperposition based on lambda-superposition. We have also improved veriT's proof output, showing how to generate fine-grained proofs that are easier to reconstruct in a proof assistant, and we have redesigned the integration of SMT solvers in the TLA+ Proof System.

Finally, our contributions to the Lean mathlib library have helped establish this library as the leading place for collaboration among mathematicians interested in formalized mathematics. In 2017, when Matryoshka started, the library was very small, and one of the two main contributors to mathlib belonged to Matryoshka.
Nested provers of increased powers