Skip to main content

Automated Program Analysis for Advanced Web Applications

Periodic Reporting for period 3 - PAW (Automated Program Analysis for Advanced Web Applications)

Reporting period: 2018-08-01 to 2020-01-31

Web applications that execute in the user's web browser constitute a substantial part of modern software. JavaScript is the main programming language of the web, although alternatives are emerging, in particular, TypeScript and Dart. Despite the advances in design of languages and libraries, it is difficult to prevent errors when programming such web applications. Although the basic principles of software verification have been known for decades and researchers have developed an abundance of techniques for formal reasoning about programs, modern software has lots of errors, as everyday users can testify.

The PAW project is creating novel automated program analysis algorithms for preventing errors in web applications. Our approach involves a mix of static and dynamic analysis techniques. Our prototype implementations are made openly available to facilitate re-usability.

The overall objectives of the project are to:
1) enable analysis of programs that use new programming language features,
2) develop analysis abstractions that enable analysis of complex libraries and frameworks,
3) expand the capabilities of automated testing techniques,
4) support migration and evolution of software, and
5) provide reusable program analysis infrastructure.
"So far, we have produced a number of scientific results that span all five objectives and that have been published at top conferences and journals:

Type Test Scripts for TypeScript Testing , Kristensen and Møller. OOPSLA 2017.
""TypeScript applications often use untyped JavaScript libraries. To support static type checking of such applications, the typed APIs of the libraries are expressed as separate declaration files. This raises the challenge of checking that the declaration files are correct with respect to the library implementations. Previous work has shown that mismatches are frequent and cause TypeScript's type checker to misguide the programmers by rejecting correct applications and accepting incorrect ones.
This paper shows how feedback-directed random testing, which is an automated testing technique that has mostly been used for testing Java libraries, can be adapted to effectively detect such type mismatches. Given a JavaScript library with a TypeScript declaration file, our tool TSTEST generates a type test script, which is an application that interacts with the library and tests that it behaves according to the type declarations. Compared to alternative solutions that involve static analysis, this approach finds significantly more mismatches in a large collection of real-world JavaScript libraries with TypeScript declaration files, and with fewer false positives. It also has the advantage that reported mismatches are easily reproducible with concrete executions, which aids diagnosis and debugging.""

Practical Initialization Race Detection for JavaScript Web Applications, Adamsen, Møller, and Tip, OOPSLA 2017 (ACM SIGPLAN Distinguished Paper).
""Event races are a common source of subtle errors in JavaScript web applications. Several automated tools for detecting event races have been developed, but experiments show that their accuracy is generally quite low. We present a new approach that focuses on three categories of event race errors that often appear during the initialization phase of web applications: form-input-overwritten errors, late-event-handler-registration errors, and access-before-definition errors. The approach is based on a dynamic analysis that uses a combination of adverse and approximate execution. Among the strengths of the approach are that it does not require browser modifications, expensive model checking, or static analysis. In an evaluation on 100 widely used websites, our tool InitRacer reports 1085 initialization races, while providing informative explanations of their causes and effects. A manual study of 218 of these reports shows that 111 of them lead to uncaught exceptions and at least 47 indicate errors that affect the functionality of the websites.""

A Survey of Dynamic Analysis and Test Generation for JavaScript, Andreasen, Gong, Møller, Pradel, Selakovic, Sen, and Staicu, ACM Computing Surveys, 50(5).
JavaScript has become one of the most prevalent programming languages. Unfortunately, some of the unique properties that contribute to this popularity also make JavaScript programs prone to errors and difficult for program analyses to reason about. These properties include the highly dynamic nature of the language, a set of unusual language features, a lack of encapsulation mechanisms, and the ""no crash"" philosophy. This paper surveys dynamic program analysis and test generation techniques for JavaScript targeted at improving the correctness, reliability, performance, security, and privacy of JavaScript-based software.

Systematic Approaches for Increasing Soundness and Precision of Static Analyzers, Andreasen, Møller, and Nielsen, SOAP 2017.
""Building static analyzers for modern programming languages is difficult. Often soundness is a requirement, perhaps with some well-defined exceptions, and precision must be adequate for producing useful results on realistic input programs. Formally proving such properties of a complex static analysis implementation is rarely an option in practice, which raises the challenge of how to identify causes and importance of soundness and precision problems. Through a series of examples, we present our experience with semi-automated methods based on delta debugging and dynamic analysis for increasing soundness and precision of a static analyzer for JavaScript. The individual methods are well known, but to our knowledge rarely used systematically and in combination.""

Systematic Black-Box Analysis of Collaborative Web Applications, Billes, Møller, and Pradel, PLDI 2017.
""Building static analyzers for modern programming languages is difficult. Often soundness is a requirement, perhaps with some well-defined exceptions, and precision must be adequate for producing useful results on realistic input programs. Formally proving such properties of a complex static analysis implementation is rarely an option in practice, which raises the challenge of how to identify causes and importance of soundness and precision problems. Through a series of examples, we present our experience with semi-automated methods based on delta debugging and dynamic analysis for increasing soundness and precision of a static analyzer for JavaScript. The individual methods are well known, but to our knowledge rarely used systematically and in combination.""

QuickChecking Static Analysis Properties, Midtgaard and Møller, Software Testing, Verification and Reliability, 27(6).
""A static analysis can check programs for potential errors. A natural question that arises is therefore: who checks the checker? Researchers have given this question varying attention, ranging from basic testing techniques, informal monotonicity arguments, thorough pen-and-paper soundness proofs, to verified fixed point checking. In this paper we demonstrate how quickchecking can be useful for testing a range of static analysis properties with limited effort. We show how to check a range of algebraic lattice properties, to help ensure that an implementation follows the formal specification of a lattice. Moreover, we offer a number of generic, type-safe combinators to check transfer functions and operators on lattices, to help ensure that these are, e.g. monotone, strict, or invariant. We substantiate our claims by quickchecking a type analysis for the Lua programming language.""

Repairing Event Race Errors by Controlling Nondeterminism, Adamsen, Møller, Karim, Sridharan, Tip, and Sen, ICSE 2017.
""Modern web applications are written in an event-driven style, in which event handlers execute asynchronously in response to user or system events. The nondeterminism arising from this programming style can lead to pernicious errors. Recent work focuses on detecting event races and classifying them as harmful or harmless. However, since modifying the source code to prevent harmful races can be a difficult and error-prone task, it may be preferable to steer away from the bad executions. In this paper, we present a technique for automated repair of event race errors in JavaScript web applications""

Inference and Evolution of TypeScript Declaration Files, Kristensen and Møller, FASE 2017.
""TypeScript is a typed extension of JavaScript that has become widely used. More than 2000 JavaScript libraries now have publicly available TypeScript declaration files, which allows the libraries to be used when programming TypeScript applications. Such declaration files are written manually, however, and they are often lagging behind the continuous development of the libraries, thereby hindering their usability. The existing tool TSCheck is capable of detecting mismatches between the libraries and their declaration files, but it is less suitable when creating and evolving declaration files. In this work we present the tools TSInfer and TSEvolve that are designed to assist the construction of new TypeScript declaration files and support the co-evolution of the declaration files as the underlying JavaScript libraries evolve.""

Message Safety in Dart, Ernst, Møller, Schwarz, and Strocco, Science of Computer Programming, 133(1).
""Unlike traditional static type checking, the type system in the Dart programming language is unsound by design, even for fully annotated programs. The rationale has been that this allows compile-time detection of likely errors and enables code completion in integrated development environments, without being restrictive on programmers. Despite unsoundness, judicious use of type annotations can ensure useful properties of the runtime behavior of Dart programs. We present a formal model of a core of Dart with a focus on its type system, which allows us to elucidate the causes of unsoundness. Our main contribution is a characterization of message-safe programs and a theorem stating that such programs will never encounter 'message-not-understood' errors at runtime.""

Type Safety Analysis for Dart, Heinze, Møller, and Strocco, DLS 2016.
""Optional typing is traditionally viewed as a compromise between static and dynamic type checking, where code without type annotations is not checked until runtime. We demonstrate that optional type annotations in Dart programs can be integrated into a flow analysis, in order to provide static type safety guarantees also for dynamically typed code. We explore two approaches: one that uses type annotations for filtering, and one that uses them as specifications.""

Type Unsoundness in Practice: An Empirical Study of Dart, Mezzetti, Møller, and Strocco, DLS 2016.
""The type system in the Dart programming language is deliberately designed to be unsound: for a number of reasons, it may happen that a program encounters type errors at runtime although the static type checker reports no warnings. According to the language designers, this ensures a pragmatic balance between the ability to catch bugs statically and allowing a flexible programming style without burdening the programmer with a lot of spurious type warnings. In this work, we attempt to experimentally validate these design choices. Our results show that some, but not all, sources of unsoundness can be justified. In particular, we find that unsoundness caused by bivariant function subtyping and method overriding does not seem to help programmers.""

Analyzing Test Completeness for Dynamic Languages, Adamsen, Mezzetti, and Møller, ISSTA 2016.
""In dynamically typed programming languages, type errors can occur at runtime. Executing the test suites that often accompany programs may provide some confidence about absence of such errors, but generally without any guarantee. We present a program analysis that can check whether a test suite has sufficient coverage to prove a given type-related property, which is particularly challenging for program code with overloading and value dependent types. The analysis achieves a synergy between scalable static analysis and dynamic analysis that goes beyond what can be accomplished by the static analysis alone. Additionally, the analysis provides a new coverage adequacy metric for the completeness of a test suite regarding a family of type-related properties.""

Feedback-Directed Instrumentation for Deployed JavaScript Applications, Madsen, Tip, Andreasen, Sen, and Møller, ICSE 2016.
""Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For such errors, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation. We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to reduce it for users who do not encounter the crash.""

In addition to these peer-reviewed publications, we have developed a comprehensive collection of fundamental static analysis algorithms for teaching purposes, openly available at http://cs.au.dk/~amoeller/spa/.
"
We expect to continue developing novel program analysis techniques for web-based software. In particular, we plan to continue the development of the TAJS analyzer and to investigate possibilities for analyzing Node.js software.