One of the main objectives of the TANGO Project is to be able to optimize energy usage of applications in a HPC environment, including the prospect of handling heterogeneity such as CPUs, GPUs and FPGAs.
A key start to optimisation is the need to monitor such infrastructures and to ensure the data obtained is accurate and consistent. The accuracy of this data is particularly useful when constructing power models that can be used to estimate future power consumption based upon expected utilisation.
We can explore the relationship between utilisation and power with a simple experiment. This can be achieved by inducing load on the physical host at fixed levels i.e. 0% to 100% usage in intervals of 10%. We can then hold the level of usage for a given period of time and take measurements.
Once we have done this we can determine the relationship between load and power consumption. We can also explore the quality of the data available. We found that not all data points that are gathered are as useful. We particularly noticed errors in the values during transitions between applying load and not. We therefore instead of looking at all data (shown as a) select a smaller range shown as b on the diagram. We can skip the initial data (shown as c) in each load period as it is likely to have values that are still in transition between the unloaded and loaded states.
It turns out that ensuring the measurement is taken when the host is in a consistent state is very important for taking any calibration data that determines the trend between utilisation and power consumption. We essentially make the following recommendations:
- To use metrics that represent the physical host in its most recent state, thus avoid metrics that perform averaging and representing long periods of time. This essentially changes how long you have to wait (the distance c) before the measurements taken only represent the loaded state.
- That load should be induced followed by waiting a set period of time for the values to stabilise and then taking measurements. A further addition to this is to detect plateaus in the measured values and only using congruent data points, which can be used as a mechanism to determine how long to wait before accepting measurements as being valid.
- To take measurements locally thus avoiding monitoring system overheads including network delays.
Further to these recommendations it’s always worth looking at other qualities regarding the measurements taken. We focus in this blog on RAPL and IPMI. RAPL provides operating system access to energy consumption information based on a software model driven by hardware counters. This model tracks the energy consumption of the CPUs, integrated GPU and DRAM. This exact coverage of monitoring depends on if the CPU was intended for server or desktop markets. Desktop market CPUs focus on package core and GPU, while server based processors focus on the package, core and DRAM domains.
IPMI on the other hand is a message-based, hardware-level interface specification which operates independently of the operating system. It utilises a baseboard management controller (BMC), which is a micro-controller embedded on a computer's motherboard that is used to collect data from various sensors. It is used by system administrators principally for recovery procedures or monitoring platform status (such as temperatures, voltages, fans, power consumption, etc.).
RAPL, has high resolution in comparison to IPMI but does not monitor all of the power consumption of the physical host. IPMI has full coverage of the system’s power usage but reports values back with a low resolution, due to only using 1 byte to represent power values. This use of 1 byte of data therefore means quantisation is likely to be used and power values are shown to increase in very distinct steps, such as every 7W.These differences mean several things which we list below:
- RAPL underestimates total power consumption, but represents the change in power well, when considering CPU only jobs (i.e. excludes GPUs, FPGAs etc)
- Finding the level of CPU utilisation that increments the value for power consumption helps avoid issues with the quantisation of power values (i.e. only limited range of expressible values). <\li>
- The experiments run with a fixed CPU utilisation, finding the maximum power consumption during a run rather than averaging helps avoid calibration issues. Selecting the highest value seen favours higher power values which might not otherwise be shown otherwise using averaging given IPMIs limited expressible values, causing a rounding down effect. Maximum values also help avoid ill effects from averaging windows given the fixed CPU utilisation causes the IPMI measured power values that use inbuilt averaging to gradually approach the true value.
Once these considerations have been taken into account, IPMI and RAPL reported power values closely observe the same change in power consumption, which helps validate the use in our models. Overall our experience has shown that great care is needed when taking measurement especially when averaging or time delays in receiving measurements are considered. Both RAPL and IPMI were not necessarily aimed specifically at measuring power consumption or at a fine level of granularity but together they may be used to give a good estimation of both current and future power consumption.