Containers in HPC

In the last 10 years, the IT world has suffered the virtualization and Cloud revolution. It started at the beginning with Virtual Machines (VMs) and nowadays the tendency it is to move to Container-based execute applications. The former version, the VM one, has not impact in the High-Performance Computing (HPC) world, but with containers, we can see a significant effort to start using it on supercomputers. To see why, let’s start first with the differences between VMs and Containers.

Virtual Machines:

In the following figure, we can see the typical layer-schema of a server running several applications and Virtual Machines:

We have the “Physical Hardware” layer, basically composed of things such as: CPU, RAM, GPU, … over it, we have the host kernel, in this case we are representing the typical scenario for a Linux OS. Over the OS kernel we have the filesystem where different applications reside and are executed, together with what it is known as hypervisor.

A hypervisor, that in the case of something like KVM it will form also part of the Linux Kernel, will help to create the vision of a Virtual Hardware. Over such Virtual Hardware we could install a typical OS installation. Again, in the case of Linux, a kernel, a filesystem and several apps running on it. From the Virtual Hardware to the apps it will be what we know as Virtual Machine.

Although the Hypervisor, in some cases, will give direct access to the hardware layer it typically adds a lot of overhead. This overhead will be traduced in performance penalty for the application running on the VMs, not an ideal scenario in HPC and the main reason virtualization was basically ignored in this world. Even with other advantages that it offers, like bigger security than just running different application from different users in the same system.

Containers:

To try to solve part of the VMs problems: Overhead and replication of layers such as the OS Kernel or Filesystem, the containers were created. A typical layer architecture for a host running containers could be the next one:

As you can see, the virtualized applications are being executed directly over the OS Kernel. A Container Daemon is also being executed as an app for the orchestration of the different containerized applications. Security between applications domains is maintained via CGroups and namespaces.

Container offer several advantages:

  • For starters, the size of a container is much less than of a typical VM. It is just necessary to packetize the application and the minimal libraries to execute. OS libraries, kernel, filesystem, will be the same one of the host. This allows to send containers and boot them several times of magnitude faster than VMs.
  • There is much less overhead. The Application is being executed at the same level as the rest of the application in the host machine, so there is no performance penalty.

Containers and HPC:

For the previous explanation, it would look like the containers are the ideal ones to be used in an HPC environment since they practically don’t have overhead but, what will be the advantage of using then in HPC? Portability. Since the containers have all the necessary libs for executing an application, the HPC Supercomputer only needs to provide to its user the minimum container runtime. Applications could be easily move from one Supercomputer to other one, and scenario that it is not the typical one, where usually applications are recompiled in the supercomputer to the libraries and schedulers available in each one.

Docker is probably the most used container system, but it has a problem for HPC environments. Since Docker apart of allowing portability of applications, it also is focused into virtualizing the whole system for the container, it creates security risk in an HPC environment where the application could have control over the network, something that will scare any HPC administrator. This virtual feature, very interesting in Cloud environments it is a no go for HPC.

To try to avoid these problems, Gregory M. Kurtzer created Singularity a container based runtime focused on application portability and forgetting a bit the virtualization part based on the Docker image format and the way RPM package applications.

Containers and TANGO:

Due to this interest in containers in HPC world and also, due the less overhead of containers for developing more portable applications, the Application Lifecycle Deployment Engine in TANGO fully support the creation of containers to develop heterogeneous applications. At this stage only with Singularity, but it is expected to add other container formats although they are not interested for HPC.