Skip to main content

Performance Portability of OpenMP

Objective

In the last years, the industry is adopting OpenMP as a common shared-memory programming standard. Using this standard eases the task of implementing parallel programs for SMPs and makes parallelism more popular in industrial environments. On the other hand, complete OpenMP is currently only available for SMP machines, but no efficient implementation exists for other architectures. Our objective is to build an environment able to generate efficient OpenMP code for any architecture, which will avoid the necessity of using different programming models for different architectures. These architectures will range from shared-memory architectures and multi-threaded architectures to clusters of workstations/SMPs with software distributed shared memory (SDSM). In the last years, the industry is adopting OpenMP as a common shared-memory programming standard. Using this standard eases the task of implementing parallel programs for SMPs and makes parallelism more popular in industrial environments. On the other hand, complete OpenMP is currently only available for SMP machines, but no efficient implementation exists for other architectures. Our objective is to build an environment able to generate efficient OpenMP code for any architecture, which will avoid the necessity of using different programming models for different architectures. These architectures will range from shared-memory architectures and multi-threaded architectures to clusters of workstations/SMPs with software distributed shared memory (SDSM).

OBJECTIVES
- Design an environment able to generate portable and efficient OpenMP code for any parallel/distributed architecture focusing the following issues:
A) Extension of OpenMP:
1. Extend OpenMP expressiveness to exploit parallelism in irregular task graphs;
2. Improve work-distribution schemes among groups of processors to enforce data locality inside parallel tasks;
3. Add support for inspector/executor techniques in shared memory;
-B) Dynamic adaptability:
1. Run the same binary file regardless of the underlying architecture, the input data, and the dynamic variation of available resources;
2. Use self analysis to modify the behaviour of the application on runtime;
C) Architectural modifications: Modify the semantics offered by SDSM and multi-threaded architectures of or better support of OpenMP applications.

DESCRIPTION OF WORK
In Nanos [1], we developed an environment to run OpenMP applications on top of Origin2000 machines. From this project, we have learnt that issues such as knowledge about memory latency and architectural configuration, or self-adaptability of applications are key factors to achieve high performance and portability. We will extend the Nanos current prototype to run on a wide range of different architectures (such as CC-NUMA, Clusters of SMPs with SDSM, S-COMA, or multi-threaded architectures). The challenge is that applications will have to adapt their behaviour depending on the knowledge about memory latency and architectural configurations, which may be very different among the architectures considered.
We will search mechanisms for applications to adapt themselves and improve their performance by analysing their own behaviour. The objective is to allow the same binary to efficiently run regardless of the input data, dynamic variation of the available resources, etc. We have also learnt that the shared-memory semantics offered by some architectures is not always adequate to run OpenMP applications. Being SDSM the only software-based architecture, we will also propose and implement modifications to its semantics. In the same line, we believe that multi-threaded architectures can be modified to offer a better support for OpenMP applications. As this kind of architectures are currently being designed, it is a good time to make this kind of modifications. The main risk and challenge of this project is the difficulty found to efficiently run OpenMP applications on all these architectures. Furthermore, we will define a range of architectures where portability is possible, should we find any limit. Nanos Long Term Research Project E-21907.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

UNIVERSITAT POLITECNICA DE CATALUNYA
Address
Jordi Girona 31
08034 Barcelona
Spain

Participants (3)

CONSIGLIO NAZIONALE DELLE RICERCHE
Italy
Address
Piazzale Aldo Moro 7
00185 Roma
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
France
Address
Domaine De Voluceau
78153 Le Chesnay
UNIVERSITY OF PATRAS
Greece
Address
Rion Patras
26500 Patras