CORDIS - Forschungsergebnisse der EU
CORDIS

Fine-Grained Analysis of Software Ecosystems as Networks

Periodic Reporting for period 2 - FASTEN (Fine-Grained Analysis of Software Ecosystems as Networks)

Berichtszeitraum: 2020-07-01 bis 2022-06-30

A popular form of software reuse involves linking open source software (OSS) libraries hosted on centralized code repositories, such as Maven or PyPI. Developers only need to declare dependencies to external libraries, and automated tools make them available to the workspace of the project. Despite the undoubted benefits for software reuse, these dependency relations introduce significant operational and compliance risks in software development for the dependent projects. Dependencies on networks of external libraries can potentially introduce privacy, security, and compliance concerns, which are difficult to assess with current tooling. Recent history has demonstrated the urgency for action through events such as the leftpad incident, which led to hundreds of thousands of websites to stop working, and the Equifax data breach, which led to a leak of hundreds of millions of credit card numbers.

The main objective of FASTEN is to mitigate these challenges and allow software development companies to reuse OSS code with confidence and would uncover a large potential of production efficiency and quality improvement. To this end, the FASTEN project introduces fine-grained tracking of dependencies on the method-level to complement existing dependency management networks. Specifically, the project tracks dependencies at the function call-graph level and performs sophisticated analyses of i) security vulnerability propagation, ii) licensing compliance, and iii) dependency risk profiles. To facilitate adoption, FASTEN brings these analyses to the hands of developers by integrating the analysis services into popular package managers, for the Java, C, and Python programming languages.
The work in FASTEN included several challenging tasks, most importantly:

- Develop scalable call graph generators for software artifacts written in C, Java, and Python.
- Gather metadata to enrich the call graphs, such as dependency information, vulnerability information, licenses
- Devise a knowledge base that is capable of storing the vast amount of information gathered from the three target ecosystems
- Optimize data processing and querying of an extensive knowledge base that contains graphs with billions of nodes
- Validate the project results through use cases that guide the architecture of data structures and tools
- Generate awareness for FASTEN through communication, dissemination, exploitation, standardization and training activities to foster the growth of its community

The results have been published in scientific articles and all data is available through public datasets and services. Developers benefits from various ready-to-use open-source tools, such as:

- Tooling for downloading and analysis of packages and related artifacts in the Maven, Debian and PyPi ecosystems.
- Scalable call-graph generators that support large-scale analysis through pre-computation
- Availability of cleaned meta-data for software ecosystems, including vulnerabilities, quality metrics, and licensing
- Integrations into package managers that allow to execute FASTEN results in their day-to-day work
The FASTEN project touches upon three highly relevant and highly active scientific fields (software ecosystems, program analysis, and graph data stores) and pushes the state of the art in all of them.

Current works have been mostly descriptive and exploratory in nature, aiming to understand how ecosystems are structured and how they work. However, there is a distinctive lack of working solutions that will help developers to practically solve the identified issues. FASTEN provides an integrated set of tools to solve many practical issues that were identified in the ecosystems literature. FASTEN advances the state of the art in software ecosystems research in multiple ways. It increases the precision of security and freshness analyses and provides a set of algorithms that exploit the underlying fine-grained call graph to accurately pin-point whether a client program can potentially execute vulnerable or out of date code. The underlying call graph can inform the design of methods to perform accurate risk profiling, using inputs such as the percentage of a dependency’s public API being used in the client program or the update rate of methods in the dependency code.

FASTEN opens the door to new applications on software ecosystems that were previously impossible. Library maintainers are able to “query the ecosystem” in order to assess the impact of their changes. FASTEN can answer queries such as “how many libraries is this change affecting” or “what percentage of my library’s public interface is actually used”, effectively minimizing package breakage due to evolutionary tasks. Library users are able to use package manager integrations to use FASTEN for a real-time monitoring of security, risk and compliance of their projects projects, thereby significantly improving the dependability of OSS as a whole.

Fittingly for an innovation action, FASTEN has advanced the state of the art in call graph-based program analysis through integration and engineering work on two fronts: the combination of static and dynamic analysis and the generation of call graphs across whole ecosystems. Specifically, FASTEN extracts dynamic call graph data from program invocations, and can complement this data with statically extracted call graphs. The two are then combined through their transitive closure in order to complement incomplete static analysis data that is missing edges due to dynamic invocation and pointer aliasing. Finally, FASTEN combines call graphs extracted across packages, which are nowadays typically integrated to form large applications. This combination will traverse API invocations, so that it can, for example, clarify which exact parts of a third-party package an application uses. Such data allows FASTEN to reason and provide advice regarding fine-grained dependencies on specific code, rather than coarse-grained dependencies on complete packages.

Scaling analyses of graphs at the size of an ecosystem required the introduction of specific compressed data structures. FASTEN has exploited the distinctive features of call graphs and provides high-performance access methods that are tailored to our search problem (reachability and method ranking). Deeper analysis and ranking is enabled through the possibility of storing additional metadata in a compact form. For example, the user can contribute profiling data that is used to decorate arcs with the estimated number of function executions at a specific location in the code: such information is invaluable in the ranking process, but requires further storage.

The wide adoption of the FASTEN results will have a significant societal and economic impact and impact the daily lives of millions of developers and users. The FASTEN tools help to control and mitigate the two existential problems in current and future software-based systems: complexity and security. Allowing software development companies to reuse OSS code with confidence will boost the efficiency and production quality.
FASTEN roll-up poster