Multi-stage Clustering Algorithm Enabling Intelligent Nano-ARPES Experiment
Recently, the HEPS (High Energy Photon Source) Software for experimental control System and Nano-ARPES (angle-resolved photoemission spectroscopy) beamline have made significant progress in automatically categorizing the ARPES spatial mapping dataset using unsupervised clustering method. Researchers have developed a multi-stage clustering algorithm (MSCA) that achieves the clustering analysis both in real space and momentum space in spatially resolved ARPES dataset and successfully distinguishes the monolayer and multilayer of MoS2 and whether it is on different substrates (BN or Au). The related work has been published in "Communications Physics" with the article title "Automatic extraction of fine structural information in angle-resolved photoemission spectroscopy by multi-stage clustering algorithm". The research is led by BIAN Lingzhu and LIU Chen, with DONG Yuhui from IHEP and CHEN Zhesheng from Nanjing University of Science and Technology as corresponding authors.
The Nano-ARPES beamline at HEPS enables the study of electron structure distribution on the surface at the micro-nano scale by focusing X-ray spots to the nanoscale. However, the complexity of the sample surface and the high-dimensional massive data volume bring challenges for Nano-ARPES data analysis, especially subtle band variations caused by certain inducing factors, such as the specific band splitting of two-dimensional materials caused by different substrates or layers, which often contain rich physical mechanisms and are the information of interest to researchers. To address this issue, BIAN Lingzhu and LIU Chen collaborated to develop a MSCA which applies the K-means clustering on three stages of data analysis. Using the K-means clustering results/metrics for real space in different energy-momentum windows as the input of the second round K-means clustering for momentum space, the energy-momentum windows that exhibit subtle inhomogeneity in real space will be highlighted. The spatial distributions of monolayer and multilayer MoS2, as well as MoS2 based on different substrates (BN or Au) are figured out based on the highlighted energy-momentum windows, as shown in Figure 1. Compared to traditional unsupervised clustering algorithms, MSCA has improved clustering accuracy by about 20%. In the future, this algorithm will be integrated into the HEPS data acquisition system (MAMBA) and applied to the Nano-ARPES beamline to achieve real-time fine clustering and band extraction during data acquisition, thereby improving the efficiency of Nano-ARPES data collection and accelerating the output of basic scientific research results.
The HEPS beamline software system is actively promoting the research on "large-scale scientific software framework + AI for Science" and has achieved a series of results in the application of big data processing for various synchrotron radiation methodologies, as shown in Figure 2.
Figure1: MSCA results. The capture of energy bands in momentum space (c-e, h-j) and the partitioning of different electronic structure regions in real space (k, n).
Figure2: Applications of AI on big data processing for various synchrotron radiation methodologies.