Variorum API

Variorum supports vendor-neutral power and energy management through its rich API. Please refer to the top-level API, as well as the specific descriptions of the JSON API and the Best Effort Power Capping API. The JSON API allows system software interacting with Variorum to obtain data in a portable, vendor-neutral manner.

Top-level API

The top-level API for Variorum is in the variorum.h header file. Function-level descriptions as well as the architectures that have implementations in Variorum are described in the following sections:

Variorum Wrappers

As of v0.6.0, Variorum also supports Fortran and Python APIs for Variorum, these can be found in the src/wrappers directory. By default, these wrappers will be enabled. The Fortran wrapper is built and installed if Fortran is found and enabled. For the Python module (called pyVariorum), a pip based install or setting of PYTHONPATH is needed. Please refer to the README in the src/wrappers/python directory for the details. Examples on the usage of these wrappers can be found in the src/examples/fortran-examples and the src/examples/python-examples directories, respectively.

JSON API

The current JSON API depends on the JANSSON-C library and has a vendor-neutral format. The API has been tested on Intel, IBM and ARM architectures, and can be used to easily integrate with Variorum (see Integrating with Variorum).

Obtaining Power Consumption

The API to obtain node power has the following format. It takes a string (char**) by reference as input, and populates this string with a JSON object with CPU, memory, GPU (when available), and total node power. The total node power is estimated as a summation of available domains if it is not directly reported by the underlying architecture (such as Intel).

The variorum_get_node_power_json(char**) includes a string type JSON object with the following keys:

  • hostname (string value)

  • timestamp (integer value)

  • power_node (real value)

  • power_cpu_watts_socket* (real value)

  • power_mem_watts_socket* (real value)

  • power_gpu_watts_socket* (real value)

The “*” here refers to Socket ID. While more than one socket is supported, our test systems had only 2 sockets. Note that on the IBM Power9 platform, only the first socket (Chip-0) has the PWRSYS sensor, which directly reports total node power. Additionally, both sockets here report CPU, Memory and GPU power.

On Intel microarchitectures, total node power is not reported by hardware. As a result, total node power is estimated by adding CPU and DRAM power on both sockets.

For GPU power, IBM Power9 reports a single value, which is the sum of power consumed by all the GPUs on a particular socket. Our JSON object captures this with a power_gpu_socket_* interface, and does not report individual GPU power in the JSON object (this data is however available separately without JSON).

On systems without GPUs, or systems without memory power information, the value of the JSON fields is currently set to -1.0 to indicate that the GPU power or memory power cannot be measured directly. This has been done to ensure that the JSON object in itself stays vendor-neutral from a tools perspective. A future extension through NVML integration will allow for this information to report individual GPU power as well as total GPU power per socket with a cross-architectural build, similar to Variorum’s variorum_get_node_power() API.

Querying Power Domains

The API for querying power domains allows users to query Variorum to obtain information about domains that can be measured and controlled on a certain architecture. It also includes information on the units of measurement and control, as well as information on the minimum and maximum values for setting the controls (control_range). If a certain domain is unsupported, it is marked as such.

The query API, variorum_get_node_power_domain_info_json(char**), accepts a string by reference and includes the following vendor-neutral keys:

  • hostname (string value)

  • timestamp (integer value)

  • measurement (comma-separated string value)

  • control (comma-separated string value)

  • unsupported (comma-separated string value)

  • measurement_units (comma-separated string value)

  • control_units (comma-separated string value)

  • control_range (comma-separated string value)

Obtaining Node Utilization

The API to obtain node utilization has the following format. It takes a string (char**) by reference as input, and populates this string with a JSON object with total CPU, system CPU, user CPU, total memory, and GPU (when available) utilizations. It reports the utilization of each available GPU. GPU utilization is obtained using the NVML and RSMI APIs. The total memory utilization is computed using /proc/meminfo, and CPU utilizations is computed using /proc/stat.

The variorum_get_utilization_json(char **get_util_obj_str) function returns a string type nested JSON object. An example is provided below:

{
    "hostname": {
        "CPU": {
            "total_util%": (Real),
            "user_util%": (Real),
            "system_util%": (Real),
        },
        "memory_util%": (Real),
        "timestamp": (Integer),
        "GPU": {
            "Socket_*": {
                "GPUn*#_util%": (Integer)
            }
        }
    }
}

The * here refers to socket ID, and the # refers to GPU ID.

The variorum_get_utilization_json(char **get_util_obj_str) function returns a string type nested JSON object. An example is provided below:

{
    "hostname": {
        "timestamp": (Integer),
        "GPU": {
            "Socket_*": {
                "GPUn*#_util%": (Integer)
            }
        }
    }
}

The * here refers to socket ID, and the # refers to GPU ID.

Best Effort Power Capping

We support setting best effort node power limits in a vendor-neutral manner. This interface has been developed from the point of view of higher-level tools that utilize Variorum on diverse architectures and need to make node-level decisions. When the underlying hardware does not directly support a node-level power cap, a best-effort power cap is determined in software to provide an easier interface for higher-level tools (e.g. Flux, Kokkos, etc).

For example, while IBM Witherspoon inherently provides the ability to set a node-level power cap in watts in hardware through its OPAL infrastructure, Intel architectures currently do not support a direct node level power cap through MSRs. Instead, on Intel architectures, fine-grained CPU and DRAM level power caps can be dialed in using MSRs. Note that IBM Witherspoon does not provide fine-grained capping for CPU and DRAM level, but allows for a power-shifting ratio between the CPU and GPU components on a socket (see IBM documentation).

Our API, variorum_cap_best_effort_node_power_limit(), allows us to set a best effort power cap on Intel architectures by taking the input power cap value, and uniformly distributing it across sockets as CPU power caps. Currently, we do not set memory power caps, but we plan to develop better techniques for best-effort software capping in the future.