Open XDMoD with Cloud Support

Federated Open XDMoD is a tool to monitor affiliated computing resources. The Aristotle team at UB has added cloud metrics to its functionality. Metrics are now available for average cores reserved, average memory reserved, average root volume storage reserved, average wall hours per session, total core hours, number of active sessions, number of sessions ended, and number of sessions started. These cloud metrics can be grouped or filtered by instance type, project resource, and VM size (core/memory).

Application Kernels (AK) Containerization in the Cloud

AK containers are used in all Aristotle OpenStack instances within the XDMoD performance monitoring module.

The majority of scientific and engineering applications take advantage of the great advances in modern CPU architectures. For efficient utilization of that computing power, applications must be compiled with a specific CPU target in mind. Because such a target is unknown for a universal Docker container, the generation of compute efficient containers can be challenging.

In the early version of AK containers, the University at Buffalo team manually compiled programs for four generations of vectorized instructions (SSE2, AVX, AVX-2 and AVX-512). In the final version of AK containers, they switched to Spark for CPU specific executables creation. Spark is a package manager for HPC systems that has multiple advantages over the manual installation. It builds software and its dependencies automatically using provided recipes and compiles it for specific CPU architecture, allowing automated container generation for multiple CPU targets while significantly simplifying software updates.

Seven AK containers have been created to date: HPCC, HPCG, IOR, MDtest, NAMD, NWChem, and Enzo. Each AK container consists of the application compiled for the four most common vectorized instructions sets and input for that application. The container automatically detects the most suitable executable, detects a number of cores to use, executes the application with provided input parameters, and outputs the results.

The Aristotle team plans to deploy containerization kernels on one or more public clouds, and benchmark and compare respective measurements from Aristotle, a UB HPC cluster, and several XSEDE resources to the public cloud(s).