DevelOpment of GRID Environment for InteRaCtive ApplicationS

The flood forecasting application framework with appropriate simulation models enable users to easily run the desired sequence of simulations and respective post-processing tools, browse the results of simulations, register results into the replica management service and applicable metadata into the metadata catalogue for later search and retrieval. The flood forecasting application consists of several simulation models (meteorological, hydrological and hydraulic) and appropriate post-processing tools connected together, thus constituting a workflow. The meteorological model is used to forecast precipitation, to be used by the hydrological model for computation of discharge of the river. That is used in the final step for actual computation of possible flood by the hydraulics model. All the models generate binary output data, which are then used by post-processing tools to generate pictures visualizing the situation. These pictures are then used by respective experts for situation evaluation. The flood forecasting application has two user interfaces to enable users to interact with the application in a more user-friendly way. One interface is implemented as a web portal accessible by standard web browser. It consists of a set of portlets reusable web components that are placed in the portlet portal framework. Another user interface is implemented as plug-in for Migrating Desktop (MD) a desktop user environment for working with grids developed (see MD results page). While the portal interface focuses mainly on the flood application, MD is a general tool that enables a user to work with grid in a flexible way. It also integrates other applications via its plug-in system.

The global purpose of the PPC tool is to provide performance information about some selected computational kernels when they are executed in a grid. In some cases, this information is just predicted (execution times, communication overheads), in others it is, in fact, real (number of communications, load balance, ...). The predicted data are based on analytical models obtained from exhaustive monitorized measurements. The kernels that are considered include both applications dependent and general purpose. In addition, the tool includes a Graphical User Interface (GUI) to help the user to establish the features of the grid, and then, to simulate their effects on the target parallel kernels. Therefore, information about the behaviour of the kernels under different virtual system configurations can be extracted and visualized. This tool can be run in a single workstation or PC, because no grid computations are involved. The end users of this tool are Grid programmers who want to know the behaviour of their programs under different grid scenarios, as well as resource brokers. Another possible use of this tool is in academic institutions to study the way in which grids behave.

The OCM-G (OMIS-Compliant Monitoring system for the Grid) is a system for monitoring of parallel applications running on the Grid. The OCM-G provides services for collecting and preprocessing information about applications at run-time. The OCM-G runs as an autonomous infrastructure exposing a standard interface. The OCM-G is designed as a basis for application-development-support tools, such as Performance Analyzers, Debugger or Visualizers. Using the services of the OCM-G, tools are (among others) enabled to obtain performance measurements of the monitored application, related to, for example, delay and volume of communication, CPU usage, etc. Information collected from the OCM-G is typically visualized in the form of graphical charts to show application progress, monitor activities of individual processes, observe communication patterns, detect bottlenecks, etc. Compared to existing systems with a similar purpose, the OCM-G provides some unique capabilities, which, among others, are as follows: - Support for Grid applications running across multiple sites. - High performance: techniques for data rate reduction to ensure extremely low overhead and high responsiveness, enough even for monitoring of interactive applications. - Flexibility: rather than a fixed set of metrics, the OCM-G provides an extensive set of low level services; this allows for construction of a variety of performance metrics with desired semantics. - Extendibility: the OCM-G can be extended with additional services, loaded dynamically at run-time. - Compact and secure design: the OCM-G runs as a set of user processes, which use a lightweight and fast socket-based communication mechanism. At the same time, state-of-the-art techniques are applied to ensure secure communication. No special privileges (special access rights, additional open ports on firewalls, or other potential security holes) are required. - Design as an autonomous infrastructure exposing a standard interface OMIS (On-line Monitoring Interface Specification). The services of the OCM-G are available via this interface, which minimizes the effort of porting OMIS-based tools across platforms (basically only the OCM-G needs to be ported). - Interoperability: thanks to the design as an autonomous service with a well-defined protocol, the OCM-G supports the interoperability of multiple tools monitoring a single application. The target users of the OCM-G are: - Application developers, and - Tool developers. The application developers need to use the OCM-G if they also use an OCM-G-compliant tool to monitor an application. However, in this case the usage of the OCM-G is straightforward. The main target user community are tool developers who can use the OCM-G as a basis for various types of application-development-support tools. The main benefits of using the OCM-G for tool developers are as follows: - The OCM-G provides an abstraction layer for accessing low-level information about (and performing manipulations on) the target system and applications. Thus, there is no need to develop a tool-specific monitoring layer. Furthermore, portability is greatly increased, since platform-specific issues are hidden in the OCM-G. Consequently, tools automatically support platforms to which the OCM-G is ported. - The OCM-G, as a common monitoring infrastructure enables interoperability of multiple tools monitoring a single application. Otherwise (i.e., if each tool is using its own monitoring layer) this is usually impossible, since different monitors are likely to exclude each other (for example because the usage of system debugging mechanisms such as ptrace is exclusive). Those features allow for substantial saving of resources (time and funds) in the process of tool development. Currently the OCM-G is a fully operational grid-enabled system possessing the above-described features. The target applications are currently MPI-based ones, though the core design and implementation of the OCM-G does not in any way depend on the particular type of the application. The OCM-G is a flexible, extendible and powerful system which currently is used as a basis of the G-PM performance analysis tool, and in the future can be used to build various types of tools, not only performance analyzers, but also different types of visualizers, debuggers, load balancers, or other tools. The OCM-G can also easily be integrated as part of a larger infrastructure, for example as part of a generic Grid monitoring and information service. In such a system, the OCM-G could work as one of many systems collecting information about different Grid entities (infrastructure, applications, middleware, etc.).

Current status of JIMS - the JMX-based Infrastructure Monitoring System is as follows. JIMS is a result of three years of development during the CrossGrid project. Its current version 1.5.32 is available as an open source project under CrossGrid Licence, for use with Linux and Unix systems. Last validated version 1.5.23 is installed in CrossGrid production and development testbed. It uses Java Management Extensions what makes it platform independent grid monitoring tool with interoperable Web Service API (Application Programming Interface) and Java command line clients. JIMS is assigned rather to other middleware tools making use of monitoring parameters, including resource brokers, benchmarks or performance predictions mechanisms. Its command line client applications can be used also by network and grid middleware administrators, to investigate the infrastructure configuration and resource usage. There are some barriers resulting from chosen Java technology, concerning memory usage, which makes it better suited for systems with large physical memory for Java Virtual Machine in which the JIMS is running.

The product supports execution of HLA distributed interactive simulations in the Grid environment. The architecture is based on the Open Grid Services Architecture (OGSA) concept that allows for modularity and compatibility with Grid Services already developed. As HLA is explicitly designed as a support for interactive distributed simulations, it provides various services needed for that specific purpose, such as time management useful for time-driven or event-driven interactive simulations. It also takes care of data distribution management and allows all application components to see the entire application data space in an efficient way. On the other hand, the HLA standard does not provide automatic setup of HLA distributed applications and there is no mechanism for migrating federates according to the dynamic changes of host loads or failures, which is essential for Grid applications. Our solution introduces HLA functionality to the Grid Services framework extended by specialized high-level Grid Services. This allows for execution control through Grid Service interfaces; the internal control and data of distributed interactive simulations flows through HLA. The design also supports migration of federates (components) of HLA applications according to environmental conditions. The possible end users are simulation developers, developers of defence applications as well as people from game industry. Benefits from using the system are clear- the applications can take advantage from Grid without loosing efficiency.

CrossBroker is a system designed to manage jobs that are submitted to the Grid. CrossBroker includes transparent mechanisms for dynamic resource discovery and selection, task mapping, and scheduling, and job monitoring and steering mechanisms. It supports sequential and parallel applications and its main features are the following: - Support for parallel jobs developed with the MPICH library and compiled with the ch_p4 device. These applications run on a single cluster. - Support for MPI applications developed with the MPICH library and compiled with the G2 device. These applications are able to run on resources from multiple clusters. - Support for computational workflows. Computational workflows are made of separate jobs (sequential or parallel) that exhibit control and data dependencies. - Support for batch and interactive execution. In batch execution, jobs have their standard input and standard output interfaces redirected to files. This execution mode is intended for applications that run off-line. The interactive execution mode provides input/output streaming in near real time and is intended for applications that run on-line with a direct interaction to/from the user. Sequential and MPI applications can use both modes of execution. - Includes a generic API for interfacing the broker to external monitoring/performance tools. The integration of the Grid Monitoring and Data Analysis Tool (GMDAT) has been implemented by using this interface. - Support for job pre-emption. This is a simple time-sharing mechanism that allows execution of interactive applications when no free resources are available. With the pre-emption mechanism these interactive applications are able to run immediately on the same machine where a certain batch application is already running. CrossBroker runs on Grid platforms based on LCG-2 middleware, which currently uses Linux RH7.3 and Globus 2.4.x, and has a specific Job Description Language (JDL) and a Glue Schema for job and resource description, respectively.

The next generation High Energy Physics (HEP) experiments at the Large Hadron Collider (LHC) would produce an unprecedented amount of data. A worldwide spread community of thousands of physicists willing to analyze that data would profit of grid technologies for this work. We have parallelized some algorithms that could be used during this analysis what would save a great amount of time to scientists doing this kind of analysis. The artificial neural network (ANN) training application is an interactive program that trains a n ANN with simulated HEP events, in order to being able to distinguish the interesting events (signal) from the already known ones (background). Thanks to the parallelization of the program, and to its good scalability, the wait time for this kind of analysis has been reduced from several hours to a few minutes. A graphical user interface is provided through the CrossGrid Migrating Desktop (MD). Using it the user can monitor the training process through the training error evolution, and can interrupt it, or reset the weights to be sure of avoiding local minima.

Unified Data Access Layer (UDAL) developed by task 3.4 CrossGird project, provides a flexible architecture for storage nodes and data centers. UDAL allows for a very flexible optimization and control of the internal way of data accessing even if it is stored in heterogeneous devices. It is fully adaptable for future purposes, but currently is used only for organizing the estimation of data access latency and bandwidth of internal storage nodes. However, it is important to notice that the provided solution could be also directly used for other purposes. The current version of UDAL is distributed together with a set of specialized components for data access cost estimation for data stored in secondary and tertiary storage. The implemented during the CrossGrid project life Unified Data Access Layer (UDAL) is supposed to simplify access to the grid storage and make it simpler and more efficient as well as to cover grid storage heterogeneity. This tool is a framework containing plugins, which are automatically selected by built-in expert system in order to be best, matched to the current context. Those plugins called cecomponents are divided into categories (specializations), which are responsible for different services offered by UDAL. For instance together with UDAL release a set of cecomponents for data access estimations to different storage devices is enclosed. The UDAL release bundle contains cecomponents for data access cost estimation to secondary storages like HDDs, disk arrays, san disk as well as cecomponents estimating access cost to data stored in tertiary storages (HSM) like DiskXtenter (former UniTree), Castor. Since, UDAL is a universal platform then it might be use for different purpose. For example, it might be used as a tool for contextual service selection in grid environment. One application of the achieved technology in that filed is planed for K-WF Grid project. There are available cecomponents unifying data access to different data storages providing a universal API. The programming interface hides heterogeneity of real storage devices installed in data centers. At the moment cecomponents are available for accessing data in secondary storages as well as specialized ones for DiskXtender available trough FTP and Castor available trough RFIO. The UDAL platform is fully expandable for future purposes. Simply writing new cecomponents or modification of built-in expert system rules can make the tool applicable for entire different purposes as well as it can be adopted or tuned to new data storing and service devices. The release bundle contains sample cecomponets, which might be used as templates for development of third-party ones. What is very important from development of new cecomponents point of view is that cecomponents can be developed in almost any programming language even whole bunch of well-known script programming tools. One of the most important benefits using that complex solution, which includes even expert system approach, is ability of UDAL adaptation for future different storage devices even if they were not taken into account during the development process of this tool. Simply future users can easily built they own cecomponents plugins then describe their specialization and in the end register using UDAL tools. Thanks to built-in expert system the new plugins will be used instantly after their registration if the execution context will require that. So, in other words, UDAL can use new plugins instantly after their registration. UDAL is supposed to be exploited by enterprise data storage centers where cost of access to data must be predicted ahead, as well as heterogeneity of device layer makes the infrastructure difficult in usage. Thanks to very open and flexibly architecture, UDAL can be easily applied to any storage center, as well as it can be easily expanded for new features by its users.

GridBench is a tool for evaluating the performance of Grids and Grid resources through benchmarking. It facilitates the easy definition of parameterized execution of benchmarks on the Grid, while at the same time allowing for archival and retrieval of results and the creation of customized charts from these results. GridBench comprises a framework of tools and a suite of benchmarks. The tools provide a user-friendly graphical interface for defining, executing and administrating benchmarks upon the resources of a Virtual Organization (VO), as well as for browsing results. Additionally, it provides tools for archiving and analysing results through the easy construction of custom graphs. GridBench leverages new and existing benchmarks from the HPC community. New Benchmarks complement the tried and respected HPC benchmarks, that have been adapted to run the Grid. GridBench can be used by Grid-infrastructure administrators to assess the performance and functionality of their resources. It can also be used by application developers and end-users who want to rank available resources and decide where to submit their jobs. For more information, please check the: http://www.grid.ucy.ac.cy/GridBench/

We built a prototype of a Grid-based problem-solving environment (PSE) for virtual vascular surgery. We use a set of hardware and software resources available via the CrossGrid infrastructure for building a specific framework to support vascular surgeons and interventional radiologists in their pre-operative decision-making. We achieved secure Grid access, node discovery and registration, Grid data transfer, application initialization, medical data segmentation, segmented data visualization, computational mesh creation, job submission, distributed blood flow visualization, and bypass creation. We have incorporated the medical application for vascular reconstruction into the Grid. The input for experiments is the data from a medical image repository in Leiden. The patients blood flow is simulated using Grid resources. An efficient mesoscopic computational haemodynamics solver for blood-flow simulations is based on parallel cellular automata. We are able to simulate pulsatile Newtonian flow in a straight rigid 3D tube. To allow for the parallel execution, the simulation volume is divided into several sub-volumes, and each sub-volume is processed concurrently. To ensure good user experience will build a unique desktop Virtual Reality system, which serves as the interaction-visualization front-end for the user manipulations over the Grid. End-users can interact with the system via a multi-modal interface, which combines natural input modes of context sensitive interaction by voice, hand gestures and direct manipulation of virtual 3D objects. We call it the Virtual Operating Theatre, as a user can play the role of a vascular surgeon planning and conducting the treatment of a vascular disease on a virtual simulated patient.

The SANTA-G NetTracer is a demonstrator of the SANTA-G framework. SANTA-G is a generic framework that was developed to support information sources that generate a large amount of data at a very fast rate in a form unsuitable for direct insertion to a Grid monitoring system. It does this by allowing direct access to the data through the Grid information system. The NetTracer demonstrates this by allowing users to access log files stored in libcap (a network packet capture library) format through the EU DataGrids (EDG) Relational Grid Monitoring Architecture (R-GMA) monitoring and information system. Examples of tools that generate logs in this format are Tcpdump, and Snort (a network intrusion detection system). It is aimed at system administrators for network traffic analysis across multiple sites within a Grid, and also for performance analysis. It is also intended to use the SNORT functionality of the NetTracer to construct a Grid-wide intrusion detection system. The added benefit of this tool is that it illustrates to those that wish to employ ad-hoc non-invasive monitoring how they can now construct it using this generic grid-enabled framework, e.g. grid sysadmins wishing to monitor the security of sites within the grid infrastructure.

The Migrating Desktop is ready to use GUI framework for accessing the grid resources in a uniform way. The user can access their data and run applications from any Internet terminal equipped with the Java Virtual Machine. This facility offers the environment, which is fully configurable, and adaptable to the user needs, it gives a transparent user work environment, independent of the system version and hardware. The Migrating Desktop supports all the grid paradigms such like the following: single sign-on and trust, monitoring jobs, job and data management, Virtual Organizations, interactive jobs, plugins, etc. This solution is a complete, production-deployed software environment with special focus on interactive grid applications. These applications are simultaneously compute- as well as data-intensive and are characterized by the interaction with a person in a processing loop. It can be used by the scientific community for simulating of the complex problems and it covers the requirement for utility computing concept in business too. We foreseen, that the Migrating Desktop can give the access to the set of business applications by providing the graphical front-end to the applications and database engines distributed across the company. Utility computing is a combination of two approaches: according to the first one companies can call upon a third party to host and manage their IT infrastructure, and according to the second one, companies can pay for the resources they use. Grid computing is similar to utility computing but with a different approach. Grid computing is a form of virtualization that can handle computation-intensive tasks, using a large number of systems and combining them into one grid. Such grids can include widely distributed systems or systems within one data centers. Grid technology has enabled computing resources to be shared globally and easily managed, and the infrastructure becomes incredibly flexible. Nowadays the infrastructure is a pool of virtual resources that the user can call on as needed. The developed environment is offered for free in terms of the CrossGrid License Agreement. It means, that there are no additional costs for the installation packages, and TCO leads only on the used hardware and administrative prerequisites of the middleware the Roaming Access Server, and related packages. The Roaming Access Server is an underlying layer that mediates between the Migrating Desktop and grid resources. Entire structure of proposed infrastructure bases on the CrossGrid License Agreement and other Open License models. There is no other grid desktop with the support for interactive applications on the market.

GRIPA (G-PM) is a performance evaluation tool for interactive Grid applications (both sequential ones and parallel applications based on MPI). It can be used by the community of program developers as well as knowledgeable program users to investigate the performance of applications, especially, to help in discovering performance bottlenecks. The tool consists of three components: a Performance Measurement Component (PMC), a High Level Analysis Component (HLAC), and a User Interface and Visualization Component (UIVC). The distinguished new features of GRIPA (G-PM) and especially the HLAC are: - It supports on-line analysis, i.e. the performance can be evaluated while the program is running. This allows immediate correlation to user interactions and is a prerequisite for interactive application steering. - It supports both automatic and user-defined instrumentation. In addition, it enables the user to define new (possible application-specific) metrics, based on the existing metrics and the user-defined instrumentation. Such a feature has never before been implemented in an on-line performance analysis tool.

The Portal that was developed in the context of the CrossGrid project had as primary goal the provision of a user-friendly web-based interface, through which the potential user would be able to submit jobs to the Grid. There is co-operation between the Portal and the Job Submission Services (JSS), which is software that resides on the Roaming Access Server (RAS). These two "components" were developed and are being maintained by other project partners. The Portal's building blocks are the portlets, all of which are visible on the portal page. Each portlet performs a specific function and all of them provide the user with the ability of submitting a job on the Grid, watching its status, and retrieving the output files upon execution. Thus, they cover all steps of the lifecycle of a Grid job. First, there is the Proxy Manager portlet. This has a key role in the Portal, since it is responsible for the authentication and authorization of the user, in order to be able to have access to the rest of the portlets. The authentication and authorization take place as soon as the user has valid credentials delegated in a MyProxy server that is located in the CrossGrid testbed. The valid proxy certificate will be retrieved by the Proxy Manager portlet, and the user will automatically gain access to all portlets that can be found on the Portal page. These range from simple ones that perform submission of a simple job on the CrossGrid testbed to portlets that perform submission of specific large applications. As soon as the user submits any kind of job, he is immediately able to watch its status as it is being carried out, until it has finished. There are also some other generic portlets (Job List Match, Job Log Info), which give information about the available testbed machines that are able to run a specific job or trace each step of the job from the time that is submitted until the time that is finished and some output (or error) has been produced. The Job Get Output portlet is an important function of the Portal. Through this, the user can retrieve the output and/or error from a submitted job. The results can be obtained via HTML links to the output and error files, which are shown inside the portlet area.

GVK offers efficient and flexible transportation mechanisms for visualization purposes, while using standard interfaces for connecting to scientific applications and output devices. With the available interfaces of GVK, grid-enabled applications may use traditional visualization toolkits as well as sophisticated virtual reality devices to present the results to the scientific user.

The Message Passing Interface (MPI) is a widely used standard to write parallel programs using message passing. However, developers of such MPI applications do not only have to face all the problems that occur in serial programming. In addition, parallel applications get more and more complex and thus also more error prone. Moreover, MPI programs do not always behave deterministically. Deadlocks or race conditions may appear, depending on the platform environment or on the MPI implementation. What is worse, they may only appear sometimes. Thus, it may take the user or developer quite a long time until he even realises that the program gives wrong results, but only sometimes. Another issue is the fact that the MPI standard leaves many decisions to the implementation, which may cause problems when porting an application from one platform to another, for example, when porting an application from a local platform to the CrossGrid testbed. Tracking down bugs in a distributed program can be a long and painful task, therefore there is a clear need for tools that support the MPI application developer during the development process. MARMOT is such an MPI application development tool that checks automatically at run-time whether an application conforms to the MPI-standard and uses e.g. MPI resources such as communicators, groups or data types in a correct way. It also verifies if the application contains non-portable constructs and thus ensures that the application runs seamlessly on any platform in the Grid. The tool can also detect problems such as deadlocks and race conditions. Any CrossGrid application can be run with MARMOT. Using the Migrating Desktop, the application can be launched and the results of MARMOT can be monitored. Currently there are only two tools similar to MARMOT: MPI-CHECK from University of Iowa, which is restricted to Fortran code, and Umpire from Lawrence Livermore National Laboratory, which is not publicly available. MARMOT supports the MPI-1.2 standard (C and Fortran language binding) and is freely available. Extending its functionality, e.g. to MPI-2 or hybrid applications, is an ongoing effort.

The air pollution application provides a high performance air quality simulation executing the STEM-II (Sulphur Transport Eulerian Model 2) program on a Grid platform. The MPI parallel version of the model, previously developed by our research group, was ported to the Grid by making use of the other CrossGrid components. Besides, the Grid enabled version was enhanced with a graphical user interface that provides interactivity features, and the inclusion of a fault tolerance mechanism based on check pointing. The main benefit of this kind of application is savings on environmental resources due to a better pollutant spread simulation. Possible end users are local authorities, weather prediction units and energy plants among others.

Modern high energy physics experiments use multilevel trigger systems to select very rare (10E-13) events with interesting physics phenomena, subject for further studies. Performance of the third level of the triggering system for the ATLAS experiment at the LHC collider at CERN, which will generate over 40 millions interactions per second, depends on complex analysis programs executed on ordinary PCs. Current estimate soars for 3500 processors necessary to perform the selection process. Some of these processors can be sought for in remote locations providing good connection is available. Using custom designed measurement system, based on programmable 1Gbps Network Interface Cards (NICs) and the Global Positioning System (GPS), we demonstrated that the long haul connection between CERN-Geneva-Switzerland and Cyfronet-Krakow-Poland using pan-european GEANT network and Polish National Research and Education Network (NREN) Pionier can provide almost 1 Gbps throughput. The network QoS measurements were followed by integration phase where initial merge of the CrossGrid services and the ATLAS High Level Trigger system (HLT) was implemented. Thus we demonstrated a possible solution to cater deficit of the CPU power for the HLT system for ATLAS. The first results however, in terms of achieved throughput, were much below the expected limit with a clear indication of transmission bottleneck being the TCP/IP protocol and request-response type of communication implemented in the HLT system. The latter is especially sensitive to long latencies observed in the long distances. We plan to continue our research using the CrossGrid testbed. The work did so far can become a foundation for continuation studies leading to design of mechanisms aimed at improving the long haul transfers of large data in real time.

The international testbed is a key component of the CrossGrid Project, as it provides the framework to run the applications in a realistic GRID environment. In particular, organisational issues and performance and security aspects, including amongst others the network support, were only be evaluated thanks to the help of the testbed, that relying on a high-performance network (GEANT), assured the participation of an adequate number of computing and data resources distributed across Europe. The CrossGrid testbed has: - Provided a distributed resource facility where the different WP developments have been tested on a Grid framework, - Supported the construction of sites across Europe, integrating national facilities provided by the involved partners into the CrossGrid framework - Monitored the network support required in the testbed set-up, and established the required links with the corresponding network providers, - Integrated the basic middleware software developed or required in the different WP tasks - Assured the required level of interoperability with other Grid testbeds, firstly with the DataGrid testbed, later with LCG and EGEE - Coordinated in practice the software releases that should provide the appropriate documentation and support for the installation across the distributed sites. In particular, it has assured that GRID applications from WP1 had run in the corresponding setup. CrossGrid testbed sites were placed in 16 different institutions distributed across 9 different European countries, expanding the Grid community to these countries.

Risultati finali

Condividi questa pagina

Scarica