Machine-learning-based online data analysis enables autonomous closed-loop experiments

05-12-2023

Real-time data analysis based on machine-learning (ML) presents an important opportunity to establish closed loop feedback systems, enables live-monitoring physical parameters beyond observables and allows real-time decision-making during synchrotron experiments. Here, an artificial neural network, capable of considering prior knowledge, was used to extract physical thin-film parameters during an X-ray reflectometry (XRR) experiment at beamline ID10.

X-ray user facilities rank among the largest scientific data producers in the world, and recent advances in accelerator development and detector technology result in an increasing volume of data generated in experiments. This is catalysing the use of ML techniques to automate data analysis. In order to prepare beamlines for ML-driven experiments, specific solutions to manage data acquisition, analysis and storage have been developed at research facilities, or in data-driven national and international collaboration frameworks such as DAPHNE4NFDI [1], PaNOSC and ExPaNDS.

This work utilises an X-ray reflectivity (XRR) experiment to demonstrate the seamless integration of user-developed ML code with beamline control infrastructure, enabling real-time data analysis and integrated archiving of the analysed results in accordance with FAIR (findable, accessible, interoperable, reusable) principles. Moreover, it showcases the accuracy and robustness of ML methods in analysing XRR curves and Bragg reflections of thin-film structures, [2] as exemplified by their capability to autonomously control a vacuum deposition setup.

User-developed ML code can be seamlessly integrated into beamline control and data acquisition software like BLISS [3] using the underlying TANGO layer [4], a common platform in beamline environments. This approach ensures high portability of the user-developed code among multiple synchrotron sources and demonstrates the interoperability of ML codes and TANGO for accessing entire ML models. Unlike real-time beamline control processes, ML data analysis can be performed using resources in central computing facilities. VISA [5], a solution for remote access to IT infrastructure for data processing, allows users to prepare and utilise IT resources exclusively dedicated to the experimental team shortly before and during specific experiments. These resources can be customised to meet the specific needs of the experiment.

In this case study, a hybrid machine learning model was developed to extract physical thin-film parameters (thickness, density, roughness) from XRR data. The model combined a one-dimensional convolutional neural network (CNN) with a subsequent multilayer perceptron (MLP). The ML model was then successfully applied to reconstruct scattering length density (SLD) profiles from XRR data obtained on beamline ID10.

Fig. 1: a) The machine learning pipeline, particularly focusing on the incorporation of prior knowledge during inference. b) Sketch of the autonomous acquisition and feedback scheme.

It is important to note that for a given SLD profile, the corresponding theoretical XRR curve can be swiftly calculated. However, reversing this operation presents a challenge due to the inherent ambiguity that often allows for multiple, different SLD profiles to correspond to the same curve within the bounds of measurement uncertainty. Fundamentally, this is related to the well-known phase problem of scattering. Consequently, it is crucial in reflectivity analysis to make use of the physical understanding of the system being investigated to reduce the number of potential solutions and to identify the correct solution. In this work, two methods are presented to integrate existing physical knowledge into the ML model at runtime. These methods are physics-based parameterisation, and the inclusion of boundaries through open parameters as additional input to the neural network (Figure 1a).

Molecular thin films of AlQ3 were chosen for demonstration purposes. With the goal of growing molecular thin films of predefined thickness, a ML-based autonomous experiment controlled the growth process and terminated the process once the target thickness was reached (Figures 1b and 2). Prior knowledge from preceding measurements, such as a plausible film thickness range, was provided as input of the ML model to achieve robust fitting for a large number of consecutive scans. Figure 2b shows the results of the closed-loop deposition control for several target thicknesses between 80 Å and 640 Å. As expected for well-functioning closed-loop control, the target thicknesses closely matched the reached thicknesses, except for one outlier. Overall, the chosen target thicknesses could be reached within an average accuracy of ±2 Å. The control software BLISS was used to store the ML analysis results along with the original raw data in one NeXus-compliant hdf5 file and to interface with the facility-provided electronic notebook.

Fig. 2: a) X-ray reflectivity measurement results and corresponding fits performed on-the-fly. b) The target thicknesses are plotted on the x-axis, while the truly reached film thicknesses at which the deposition was terminated are given on the y-axis. In this representation, data on the diagonal line illustrates the well-functioning closed-loop experiment.

This use case convincingly demonstrates the key advantages of using ML in this context. The ML approach provides reliable fit results both for simple two- to three-layer models as well as for complex multilayer models in the millisecond regime. The combined speed and reliability of the ML approach cannot be achieved by simple fitting scripts or with reliance on human supervision. More broadly, ML-based online data analysis has tremendous potential to make publicly available datasets FAIR through enriching the raw archived data with scientifically relevant, real-time data analysis results (data + metadata).

Principal publication and authors
Closing the loop: autonomous experiments enabled by machine-learning-based online data analysis in synchrotron beamline environments , L. Pithan (a,e), V. Starostin (a), D. Mareček (b), L. Petersdorf (c), C. Völter (a), V. Munteanu (a), M. Jankowski (d), O. Konovalov (d), A. Gerlach (a), A. Hinderhofer (a), B. Murphy (c), S. Kowarik (b), F. Schreiber (a), J. Synchrotron Rad. 30, 1064-1075 (2023); https://doi.org/10.1107/S160057752300749X
(a) University Tübingen, Tübingen (Germany)
(b) University Graz, Graz (Austria)
(c) University Kiel, Kiel (Germany)
(d) ESRF
(e) DESY

References
[1] https://www.daphne4nfdi.de
[2] A. Hinderhofer et al., J. Appl. Cryst. 56, 3-11 (2023).
[3] https://www.tango-controls.org
[4] https://bliss.gitlab-pages.esrf.fr/bliss
[5] J.-F. Perrin, ESRF Highlights 2022, 158-159 (2022).

partners

European Synchrotron Radiation Facility - 71, avenue des Martyrs, CS 40220, 38043 Grenoble Cedex 9, France.