IT resources are available in a flexible, reliable and easy-to-use manner.
In the context of IT hardware infrastructures, and in order to address the future challenges of data volumes and data analysis, a number of different GPU platforms were benchmarked and the best results were obtained for the IBM Power9 architecture (Figure 6). The new GPU cluster 15+1 Power9 computers with two high-end GPUs in each has been procured, in addition to 11 Dell blade chassis servers, leading to an additional 1760 CPU cores in the computer cluster. The DGX-1 delivered in 2019 is being used to run the analysis programs for the cryo-electron microscope (CM01), with the performance boost having a significant impact on the faster running of data analysis programs.
All ESRF beamlines are moving to using the HDF5 data format when running with the new beamline control system, BLISS. The 10 new Dectris detectors also produce HDF5. Support for the HDF5 data format has been enhanced for visualising and manipulating files based on silx and web technology. Beamlines now use the same libraries for displaying data during (online) and after (offline) the experiment. Work also continued on a generic solution for data processing workflows. The new H2020 STREAMLINE project will inject additional resources into developing workflows for the ESRF beamlines, including artificial intelligence algorithms.
Collaborative efforts to work on data analysis software led to the PaNOSC (Photon and Neutron Open Science Cloud) proposal, coordinated by the ESRF and aiming to boost and harmonise data management and services across European photon and neutron sources. Topics include data policies, FAIR data principles, rich metadata, data stewardship, data analysis notebooks, remote services for data analysis, long-term archiving, trusted data repositories, and training for users. All of these topics are being addressed in the context of the European Open Science Cloud (EOSC) to connect data to computation in the most efficient way in order to reduce the data bottleneck facing users and encourage citizen science.
A sister project for national photon and neutron sources, ExPaNDS, has also been financed. There is strong overlap between the two projects and PaNOSC and ExPaNDS work closely together to ensure a common approach to scientific data management and the EOSC across all European photon and neutron institutes. The ESRF Open Data Policy (http://www.esrf.fr/ datapolicy) implementation is progressing, with metadata systematically collected on
nearly all beamlines, including the cryo- electron microscope (beamline CM01), and raw data archived for 10 years.
With the restart of the beamlines, the electronic notebook has been rolled out to positive feedback. The electronic logbook is also playing an important role in the commissioning of the EBS beamlines. Development has continued during the long shutdown with the addition of new features requested by beamlines and in preparation for the first release of the e-logbook as open data. In this regard, a new data portal has been developed and released. The new data portal for ICAT, ICAT+, enhances the user experience and will provide support for online data analysis (developed as part of PaNOSC).
The CalipsoPLUS JRA2 (led by ESRF and PSI) has developed a prototype of Data Analysis as a Service for local and remote users. A generic Jupyter notebook service is now in production, using the new computer cluster via the SLRUM batch scheduler. The next step is to assist beamlines in developing standard notebooks for standard analysis procedures that can then be provided to users for data processing. The ESRF continues to invest in developing optimised data analysis algorithms using parallel hardware (GPUs) to handle the expected increase in data volume.
J.-C. BIASCI, C. NEVO, J. JACOB, P. RAIMONDI, H. REICHERT AND J. SUSINI
Fig. 6: Upgrade work is ongoing at the Data Centre to increase computing clusters and storage capacity.