Cloud services support development of machine learning applications

Programming ordinary software is difficult. Doing so for machine learning supercomputer applications is so difficult that even experts need help.

Digital Economy

European research groups increasingly hope to utilise cloud computing services. This might be for processing or storage, or a combination of both. Doing so would normally require dealing with low-level machine configuration, a lengthy and difficult process requiring specialised personnel. The EU-funded DEEP-HybridDataCloud(opens in new window) project developed an alternative. Whereas other cloud services essentially provide access to hardware resources, this project developed high-level, research-enabling services. These include software components that allow the development, exploitation and sharing of computing-intensive data science applications.. The components simplify the development life cycle for applications featuring artificial intelligence, machine learning or deep learning. The services also provide transparent access to the EU’s e-infrastructures and specialised hardware components (such as accelerators), while hiding the underlying complexity. Project work builds on the earlier work of the INDIGO-DataCloud(opens in new window) project. Customer feedback obtained during that initiative allowed DEEP-HybridDataCloud researchers to develop the technology needed to meet the users’ expectations. “We are exploiting the hybrid cloud service model,” explains project coordinator Álvaro López García, “in order to gather resources seamlessly across different private and public cloud providers, so that they can be exploited transparently for the users.”

Targeting different user types

The project team identified three types of users, depending on how much knowledge they have about science, machine learning and technology. The first has strong knowledge of their field of study, along with a problem that can be solved using machine learning cloud-layer models the project allows access to. These users interact with a black-box service just to get functionality and do not need to know anything about how the models are executed. A second group of users knows about their domain of science plus machine learning. These users will typically be data scientists developing a deep learning application. This group knows what accelerator hardware it needs but does not want to worry about obtaining or configuring it. The project provides these users with platform-level tools, with which they can specify their hardware or training requirements. A final set of users knows about all three areas. They too will be developing scientific machine learning applications, but they will have sufficient technological strengths to know their infrastructural requirements. The project allows these users a tailored access to the whole cloud stack.

Diverse scientific applications trialled

Several research groups have successfully used the services. The software components already developed are available via the project’s public DEEP Open Catalog(opens in new window). These address numerous branches of science, including citizen science biodiversity monitoring and species identification, plus related pattern recognition in satellite images. Further themes include cybersecurity and network threat detection monitoring, modules for testing the DEEP framework and multipurpose components. “Our key final result is the inclusion of our services in the European Open Science Cloud portal(opens in new window),” adds López García, “so we have become the provider of machine learning services in the EOSC.” The team will keep supporting the existing services, while developing new features requested by users, and looking for commercial opportunities. Ongoing work will facilitate the development of machine learning applications in support of European scientific research.