Skip to main content

BioExcel Centre of Excellence for ComputationalBiomolecular Research

Periodic Reporting for period 1 - BioExcel-2 (BioExcel-2 Centre of Excellence for Computational Biomolecular Research)

Reporting period: 2019-01-01 to 2020-06-30

Computing is a critical resource for high-impact research and commercial applications, in particular in Life Sciences. Exascale computer systems enable ever faster modelling and simulations, with paramount influence on health/medical applications, drug development, efficient drug delivery, biotechnology, environment, agriculture, food industry, and not least education.

BioExcel CoE was established to:
• Push the performance, efficiency, scalability, and usability of the selected software packages towards the Exascale in a co-design manner;
• Support convergence of HPC, High Throughput Computing (HTC), and High Performance Data Analytics (HPDA) with workflows combining HPC simulations with data management and analytics;
• Support and enlarge the user community (both academic and industrial) by providing workforce development, continued training, guidance, and best practices;
• Develop a sustainable and open community centre with user-driven governance and close global collaborations with US and Asian initiatives.

The need for extreme-scale computing in Life Sciences became apparent as the Covid-19 pandemic struck. Abilities for fast modelling and screening of potential drug candidates and antibody design are needed to dramatically reduce the time to a successful medicine or vaccine discovery, and thus damp the devastating effects of pandemics not only on individual lives but on the society as a whole. Through its prompt response, BioExcel showed the need for large pan-European initiatives which through concerted efforts can positively affect the fight against diseases.
The performance of free-energy calculations in GROMACS has been improved significantly and more algorithms have been ported to the GPU allowing for more acceleration. Co-design projects are now underway with all major HPC CPU and GPU vendors, including work targeting new next-generation hardware. Internal resource planning is now allocated every quarter based on public user requests. Strategic efforts target upcoming EuroHPC systems. HADDOCK released version 2.4 and version 3.0 with a more modular code-base allowing for custom workflows. The use of HADDOCK has more than doubled during the COVID pandemic and the server capacity has been extended. CP2K is now used in BioExcel as a main code to enable QM/MM calculations in combination with GROMACS. A set of relevant benchmarks has been set up and performed to identify bottlenecks in performance and scaling. In PMX the efficiency of free-energy calculations has been improved and features have been extended. A massively parallel ligand affinity screening campaign was used to demonstrate readiness for the exascale.

Earlier workflow prototypes have been extended, optimized and some are already at maturity level for reliable production runs. They have been deployed on demand for several massively-parallel use cases including Covid-19. The first release of the BioBB library of application building blocks comes with a feature-rich set of modules with particular attention given to interoperability between the components. The release generates considerable interest in the communities including industry (AstraZeneca) and several collaborations have been started. It was also highlighted in the EU Innovation Radar. Portability and usability were further improved by containerization (Docker/Singularity) and packaging (CONDA), which along with integrations with Jupyter notebooks allow for smooth deployment and direct access on all major cloud infrastructures. Through collaboration with ELIXIR, we adopted FAIR principles for data management in all of our solutions. Combinations of PyCOMPSs workflow manager, BioBB building blocks, and the core applications GROMACS, HADDOCK and PMX to scale up key techniques applied to solving our main scientific use cases.

Despite impact from the COVID-19 we have organised a number of successful events that attracted significant attention from the community. We hosted 13 webinars attended live by 736 community members, recordings of which were viewed over 3500 times on our YouTube channel. We have made available 23 tutorials – an often requested training/support format which we identified as extremely valuable to users. We provided direct in-depth support to around 750 users of the core applications via various mechanisms including AskBioExcel forums with over 450 user queries resolved. Our forum accumulated almost 140,000 pageviews, of which 50,000 from registered & logged-in users and 90,000 from anonymous visitors (excluding web crawlers). The BioExcel Twitter account now has over 2000 followers, the unique views on our project is consistently between 2000-3000 per month, and the BioExcel Mailing list has over 1800 subscribers. Our successful competency-based training programme was further extended with integration of remote training. We assisted other organisations in making the switch to virtual training mandated by the pandemic.

We continued exisitng industrial collaborations and engaged in new ones with specific pilot projects with pharma companies. A BioExcel quality mark was developed in collaboration with SSI to increase the trust in our offerings. A service catalogue has been developed and deployed. IPR issues were addressed. The form of the legal entity to support commercial operations of the centre in the long term has now been decided. An Economic Association is to be incorporated in Sweden and initial legal discussions to deliver this are currently underway.

Quality assurance and KPIs were established, all targets were met with many reaching the stretch ones. Many activities such as consortium meetings were adapted in light of the increasingly remote/virtual style of events in the last two quarters.

COVID-19 efforts: Within the early days of the pandemic, BioExcel restructured efforts in support of addressing the crisis. In addition to working on specific COVID-19 related research, we focused on facilitating collaborations, extending community support, and providing access to HPC resources at partner centers. Some of the initiatives include establishment of the Covid-19 Molecular Structure and Therapeutics Hub (http://covid.bioexcel.eu); partnership in the Exscalate4Cov Consortium (https://www.exscalate4cov.eu ); launch of a dedicated web-server interface (https://bioexcel-cv19.bsc.es); doubling the number of concurrent jobs on the HADDOCK server to meet demand; signing a community letter in support of initiatives to share biomolecular modelling and simulations data; participation in numerous webinar series and presentations (including CECAM ones) covering methodologies, experience and results from our work.
BioExcel core applications continue to be at the forefront of HPC computing with support for all existing major hardware systems. Co-design activities ensured portability, scalability and extreme performance. Workflow solutions were successfully used for pre-exascale productions runs. All of our codes as well as main workflow platforms are included in the EU Innovation Radar. Usage of our tools continues to grow. BioExcel is more and more recognized as a provider of highly sought after expertise. Our training events are regularly oversubscribed multi-fold. Training program has been extended with very effective remote capabilities and we have built substantial know-how on the topic. Quick and efficient restructuring of efforts to address Covid-19 pandemic.