Periodic Reporting for period 2 - POP2 (Performance Optimisation and Productivity 2)
Reporting period: 2020-06-01 to 2022-05-31
The huge and growing complexity of the parallel computers used for this purpose leads to a situation where application developers and users are not always aware of the detailed issues affecting the performance of their applications. The result is often an inefficient use of these expensive and energy consuming infrastructures. As evidenced by the current crises, maximizing the efficiency in the usage of all kinds of resources is an objective that should be targeted at all levels of our activities as a society, including the efficiency in the use of our computing resources. Even in the cases where a need to get further performance and efficiency of our HPC platforms is perceived, code developers may not have enough insight on the detailed internal performance of the codes and machines to properly address the problem. This may lead to blind attempts to restructure codes in a way that may not be the most productive.
The POP2 Center of Excellence (CoE) aimed at promoting best practices in the performance analysis and optimization of applications across all domains of scientific research and industry. This was done through assessment services where the application performance was analysed, efficiency losses identified and suggestions on how they could be avoided was provided to the application developers. This provided a useful external insight they could use to steer their application refactoring efforts and usage practices. We also did Proof of Concept services helping in such refactoring efforts in case the code owners were not experience on how to apply the proposed improvements.
POP2 target was to perform 180 services over a 3-year period to customers both from research and industry and with an important focus in supporting other CoEs. Additional planned activities included: efforts to identify ad attract new users; further extension of the analysis methodology; improvement of the performance tools used in those analyses; produce training and dissemination material as well as material and resources that could be used by other projects for the co-design of HPC platforms; and finally implement a quality control process in the implementation of the services itself.
These activities were oriented to ease the analysis process, reduce the effort and cost of performing the assessments (ease of installation, portability, coverage of different platforms programing models and languages) and overall, towards promoting a new culture and best practices on how efficiency in the use of our computing resources can be understood and improved.
The project has successfully achieved its objectives and we feel proud of the broad appreciation we received from the HPC community in Europe.
The results of the project are gathered in our web site www.pop-coe.eu including a blog, links to our youtube channel posts, training material including explanatory videos and actual data for hand on sessions, links to the recordings of the 25 Webinars we have organized, 3 monthly newsletter, documentation describing the methodology and co-design resources pages providing access to summarised insights as well as the individual reports for further analysis. Overall, a very exhaustive set of dissemination materials and events where produced, organized or attended and the material in our web page keeps getting very good reference rates.
The analyses do show how different codes have potentially very different inefficiency causes. In many cases with intricate coupling effects between several of them. Very often the identified issues were not known to the customer or at least she/he was not aware of the quantitative importance of the issue. This is actually identified by the internal customer follow up (by our “customer advocate” partner) after the services were completed and stresses the usefulness and importance of the external view provided by the project analysts.
The project also homogenized the analysis methodology between the partners in the project, each of them strongly linked to the HPC community as tool developers and belonging to relevant supercomputing centres. This homogenized approach will have good inter domain, geographical glue effects in the community and the culture we promoted will continue permeating though many of the application domains. The broader impact of the project will actually derive form the competitiveness in research and industry in these domains.
As reported in our web page blogs and success stories, we have been able to actually report performance gains in some cases where the refactoring proposals have actually been implemented. This gain ranges for a few percent, to factors of 5-10x acceleration over the original parallel code for the same number of resources to factors of several hundred acceleration, certainly depending a lot on the original code. How a performance improvement translates into better competitiveness of an end user widely depends on each specific customer context.