CORDIS - Resultados de investigaciones de la UE
CORDIS

Performance Optimisation and Productivity 2

Periodic Reporting for period 2 - POP2 (Performance Optimisation and Productivity 2)

Período documentado: 2020-06-01 hasta 2022-05-31

High Performance Computing (HPC) deals with the use of extremely powerful computers consisting of many thousands of processors to address huge computational problems and solve then in very short time. Computations required to design planes or cars, predict the weather next week or the climate next century, understand the properties of new materials, study the evolution of the universe or design new medicines fall in this category of applications. From this point of view, High-performance computing is a fundamental tool for the progress of science and engineering and as such for economic competitiveness.
The huge and growing complexity of the parallel computers used for this purpose leads to a situation where application developers and users are not always aware of the detailed issues affecting the performance of their applications. The result is often an inefficient use of these expensive and energy consuming infrastructures. As evidenced by the current crises, maximizing the efficiency in the usage of all kinds of resources is an objective that should be targeted at all levels of our activities as a society, including the efficiency in the use of our computing resources. Even in the cases where a need to get further performance and efficiency of our HPC platforms is perceived, code developers may not have enough insight on the detailed internal performance of the codes and machines to properly address the problem. This may lead to blind attempts to restructure codes in a way that may not be the most productive.
The POP2 Center of Excellence (CoE) aimed at promoting best practices in the performance analysis and optimization of applications across all domains of scientific research and industry. This was done through assessment services where the application performance was analysed, efficiency losses identified and suggestions on how they could be avoided was provided to the application developers. This provided a useful external insight they could use to steer their application refactoring efforts and usage practices. We also did Proof of Concept services helping in such refactoring efforts in case the code owners were not experience on how to apply the proposed improvements.
POP2 target was to perform 180 services over a 3-year period to customers both from research and industry and with an important focus in supporting other CoEs. Additional planned activities included: efforts to identify ad attract new users; further extension of the analysis methodology; improvement of the performance tools used in those analyses; produce training and dissemination material as well as material and resources that could be used by other projects for the co-design of HPC platforms; and finally implement a quality control process in the implementation of the services itself.
These activities were oriented to ease the analysis process, reduce the effort and cost of performing the assessments (ease of installation, portability, coverage of different platforms programing models and languages) and overall, towards promoting a new culture and best practices on how efficiency in the use of our computing resources can be understood and improved.
The project has successfully achieved its objectives and we feel proud of the broad appreciation we received from the HPC community in Europe.
The project completed 149 assessments and 33 proof of concept services to 157 customers, 33 of then SMEs. Some of the requested several assessments of different codes or of the same code after they implemented some of the suggestions in a first assessment. POP2 has carried out specific assessment campaigns with other domain specific CoEs. In particular: ChEESE (2 rounds), EoCoE, E-CAM, CoEC, NOMAD. We additionally performed studies for PerMedCoE, ESiWACE and CompBioMed for a total of 49 studies performed for other CoEs.
The results of the project are gathered in our web site www.pop-coe.eu including a blog, links to our youtube channel posts, training material including explanatory videos and actual data for hand on sessions, links to the recordings of the 25 Webinars we have organized, 3 monthly newsletter, documentation describing the methodology and co-design resources pages providing access to summarised insights as well as the individual reports for further analysis. Overall, a very exhaustive set of dissemination materials and events where produced, organized or attended and the material in our web page keeps getting very good reference rates.
The analyses do show how different codes have potentially very different inefficiency causes. In many cases with intricate coupling effects between several of them. Very often the identified issues were not known to the customer or at least she/he was not aware of the quantitative importance of the issue. This is actually identified by the internal customer follow up (by our “customer advocate” partner) after the services were completed and stresses the usefulness and importance of the external view provided by the project analysts.
The project has had a strong impact in the European HPC community. The appreciation of the services was satisfactory in more than 93% of the cases and our customers appreciated the external view and insight provided by our analysts even if in most of the cases they themselves do have a strong HPC background. The suggestions we made will help them steer their refactoring efforts, in a process that may require even some years in many cases as they as owners of the codes and experts in the domain have many other constraints beyond the technical one we have discussed with them.
The project also homogenized the analysis methodology between the partners in the project, each of them strongly linked to the HPC community as tool developers and belonging to relevant supercomputing centres. This homogenized approach will have good inter domain, geographical glue effects in the community and the culture we promoted will continue permeating though many of the application domains. The broader impact of the project will actually derive form the competitiveness in research and industry in these domains.
As reported in our web page blogs and success stories, we have been able to actually report performance gains in some cases where the refactoring proposals have actually been implemented. This gain ranges for a few percent, to factors of 5-10x acceleration over the original parallel code for the same number of resources to factors of several hundred acceleration, certainly depending a lot on the original code. How a performance improvement translates into better competitiveness of an end user widely depends on each specific customer context.
POP2 Kick off meeting - Barcelona, Dec.2019
“POP analysis methodology”, University of Queensland, Sep 16-18, 2019
POP2 poster presentation at EuroHPC Summit Week 2019 - Poznan, May 2019