Skip to main content

Application Information Services for Distributed Computing Environments

Final Report Summary - AIS-DC (Application Information Services for Distributed Computing Environments)

1. Introduction / Summary
The term cloud computing has been introduced in late 2007 and became widely acceptable in early 2008. It started with the establishment of Amazon Elastic Compute Cloud (EC2). Over the past two years, every major computer vendor has made a significant effort to participate and provide cloud computing services. By many, cloud computing is seen as the future of computing where, rather than owning and managing resources, one simply obtains access to needed services allowing easy resource scaling and reduced infrastructure maintenance. As this trend continues and additional service providers emerge, competition between services providers will raise. It will become imperative for the service providers to maximize utilization of their resources and provide desirable QoS to its users. In order for this to happen, users will need an ability to easily and coherently compare offered services. This can be realized only if there is sufficient data about individual providers and received services as well as if the data is in a standardized and portable format.Parallel to the commercial efforts, cloud computing has attracted considerable attention from the academic community regarding of project realisations.
The global aim of AIS DC project was to provide a well-defined set of application-specific data collection and retrieval mechanisms that operate in and across cloud environments. In addition, a set of data collection templates and application skeleton wrappers provided to support collection of data out-of-the-box. Once available, it will be trivial to deploy complete and fully functional AIS, which can then be utilized by job submission tools, accounting services, parameter sweep tools, application performance optimization tools, or directly by smart applications. With the expected expansion of cloud computing, new and existing projects will be tailored to the cloud computing paradigm and, with availability of AIS, those projects can rely on existence of needed application-specific data rather than having to devise custom solutions. This will save time, cut costs, as well as promote result reuse and reproducibility. Although initially focused on being utilized by tools, as AIS develops and more tools and data collection templates become available, it is foreseeable that end users directly will be able to perform application performance analyses across cloud computing providers.
Prior to joining the Host, the Fellow researcher has successfully completed his postdoctoral training at Emory University, USA. He joined the Host in autumn 2011 and now holds a research scientist position at the Host. During the four years, the Fellow has become an integral part of a growing group and has taken a role of one of the pillars of future development of the group; the group has developed significant expertise in cloud computing technologies and bioinformatics, largely as a result of the Fellow joining the group.
Specified objectives list the efforts of this project and their immediate impact on the project’s outcome. Project effort also have long-terms implications and aim to extend beyond this project, as results. These are listed here:

a. The bulk of fellow’s effort continues to revolve around his research agenda: accessibility of distributed computing resources. The outcome of this effort is materialised through continued development of the CloudMan software application ( where many of the research efforts are implemented and delivered to the broader scientific community. In collaboration with the Galaxy and Genomics Virtual Laboratory projects, as well as the outcome of the effort within this project, the CloudMan application has become a de facto standard for doing bioinformatics data analysis on the cloud; it is used daily by researchers around the world as a segway for acquiring access to the software applications required to perform biomedical data analysis - this is likely the single most important impact this project has led to.

b. Other project initiatives and realisations with direct connection to AIS DC was bilateral projects with the Medical University of Innsbruck and one with the University of Freiburg. The project with the Medical University of Innsbruck, titled “Automated Cloud Service Provisioning for Big Data Applications” focuses on creating a dashboard of software services for use with next generation Big Data applications in the cloud. Two grant proposals have been written as direct extensions of this project. The first proposal was under the FP7 programme titled “eCloudMan-Easy and Extensible Cloud Manager” and focused on extending the flagship software project of this project. The largest one was a Horizon 2020 proposal titled “Multi-Cloud System for Automated Service Provisioning and Sharing” that was written in collaboration with seven other partners across the EU coordinated by RBI. The focus of this project as a large extension of the topics underlying this project with the focus on scalability of biomedical data analysis. Unfortunately, this grants was not awarded. We have also submitted a proposal to the Croatian national ministry of science titled “EXascale PERspekTiva - EXPERT : Analysis of algorithms and heterogeneous architecture to solve complex problems” where the focus of the project is to grow local expertise among young researchers at RBI. The intent is for Dr. Afgan to be one of the mentors on the given project. Based on RBI Group expertise on Cloud Computing become to Horizon 2020 INDIGO project title: INtegrating Distributed data Infrastructures for Global ExplOitationin (ID 653549) in which fellow take active part.

c. Overall, the fellow has integrated well with the Host institution. He has accepted a permanent research scientist position and his research agenda has been adopted by the group as one of the core group research pillars. His on-going collaborations with institutions and research groups from the EU, USA, Australia have been readily adopted by the members of the Host group. This is evident from a number of joint 8 journal publications and 8 workshop presentation which results more than 200 citations.
Today, science and business sectors require faster and more innovative solutions to meet their day-to-day needs. In addition, every day new devices, computational resources, services and software solutions appear increasing the heterogeneity that each user has to deal with. To meet their needs, large research efforts were put towards enhancing cloud-based services and infrastructure, development of various advanced cloud platforms, including numerous Big Data platforms or specialised cloud-based services and applications. At the same time, increasing attention is put towards gathering the pieces of scattered solutions under a single, unified control system that will ultimately bring simplicity and uniformity in their usage.
The basic aim of the AIS DC project was to overcome the limitation of sharing the applications, resources and data between cloud deployments, applications, services and data among different users. Following a bottom-up approach and being guided by representative research domains (i.e. bioinformatics and general Big Data analysis) and commercial opportunities (i.e. innovative large scale data oriented products and marketplace for business domains), AIS DC represents a step forward in increasing socio-economic impact in the research and business opportunities, especially for small to medium-sized research groups, and service and cloud providers. It does so by delivering an easy deployment of applications and services on the most appropriate heterogeneous cloud resources. Further, as the space of biomedical research moves into the clinical domain an increasing amount of domain-specific users will need access to the appropriate analysis environments. This project has laid the groundwork to enable this transition from the computational standpoint. These ideas have been further supported with close collaborative ties with other large and ongoing projects, names the Galaxy project led from the Johns Hopkins University, and the Genomics Virtual Lab project led by the University of Melbourne and Horizon2020 INDIGO DC project lead by INFN Italy.

Project web page: