Two decades ago, parallel programming was a technology restricted to large scale specialised applications that were running in computing centres. Today’s picture is very different with the generalisation of multi-core processing devices, not only inside desktop computers, but inside embedded and mobile devices as well.
Image processing, The Internet of Things (IoT) and cloud computing are examples of domains in which concurrent cooperative programming, that-is-to-say parallel programming, develops widely.
Massively parallel computing has entered the general purpose computing market through customer oriented GPUs in desktop computers. It is now a standard technology for applications like gaming or imaging. This technology spreads to the embedded or mobile devices through the introduction of System-On-the Chip (SOC) devices including not only multiple CPU cores, but also GPU accelerators.
A recent move is the introduction of FPGA accelerators not only in the high performance computing market, but also into the embedded market through the inclusion of such programmable logic into SOC devices.
Therefore, tooling for parallel programming is now of highest importance for all developers. Implementing parallel processes therefore must become as natural as sequential programming. This is true for all markets from embedded to high-performance computing, including office, leisure, industry or communications. And an absolute requirement for new parallel programming tools is their capability to manage hardware heterogeneity. They must provide the transparent ways enabling the use of the best device for each job.
Moreover, today computing efficiency is not the single focus. In embedded world, as mobile devices become more and more powerful, their autonomy or small size become key competitive assets. As a result, “best device” means today not only the one that provides the best computing performance but also the best energy efficiency.
As we are interested in this blog with the embedded market, let’s first define what we mean by “embedded device”. It does not mean “small” or “mobile”, but “specialised” and “autonomous”.
The following properties and constraints are attached to embedded applications. They do not relate directly to the computing power of the platform but to the nature of the application.
- The application architecture is “communication”, “service request” or “event” driven.
- The application is either real-time or responsive or both.
- Computing tasks are smaller in the embedded world than in the high performance computing world, as they often relate to “instantaneous” processing.
- Resources in embedded systems are limited.
- Systems are usually deployed statically and run in a continuous way.
Another key element that distinguishes embedded computing from HPC, which is shared with general purpose computing, is the multiplicity of applications and the need for fast time-to-market development cycles.
In the embedded world, the following key-features are expected from a parallel programming framework: small footprint, small overhead, programming simplicity, versatility, scalability and possibility to extent over networks.
This is to be added to the need for management of heterogeneous platforms, including CPUs, GPUs, FPGAs or even dedicated accelerators.
The TANGO framework is a toolbox that supports parallel programming on heterogeneous platforms, with a strong articulation around a programming model and an underlying run-time abstraction layer.
To be a complete framework, TANGO includes a large set of components, all not being relevant for embedded applications. For instance, a set of components are built around a workload manager dedicated to the deployment of applications on a large set of computing nodes. While such a component is mandatory in the HPC world, it is not relevant when there are only a few computing devices executing the same code all the time.
The implementation of some components may also vary as a function of the targeted market.
TANGO tools that will bring value to the embedded market are:
- The TANGO programming model: It helps creating programming code for parallel execution on a heterogeneous hardware architecture, through an efficient directive driven extension of conventional C/C++. It also includes mechanisms that simplify the integration of code for specific devices that must be written using dedicated language extensions, for instance CUDA or OpenCL. It supports distribution of code on multiple independent platforms. Moreover this programming model provides asynchronous task execution managed by an underlying run-time abstraction layer.
- The Requirement & Design Modelling Tool: A design time tool that enables simulating the execution of the computational process considering the characteristics of the available devices and the communication pipes existing between them.
- The Code Profiler Tool: It is dedicated to the determination of the energy footprint of the code. This tool should enable the developers to identify energy hot spots in different sections of their code and therefore help them to determine the parts that need optimisation to reduce the energy consumption.
- The Application Life-Cycle Deployment Engine: It will propose package building functions to ease the creation of alternate configurations during the development cycle.
- The Self-Adaptation Manager: It relates to the implementation of the same computing tasks on different devices, with dynamic selection of the actual execution target during operation, as a function of different parameters including Quality of Service (QoS) and possible energy savings.
So, the TANGO framework will propose the embedded programming community a set of valuable tools, and particularly an advanced programming model that will provide support for hardware heterogeneity, multi-node processing and asynchronous execution, with a strong focus on energy management. As product time-to-market and project development life-cycles are increasingly critical in many competitive companies, the unification of the tools exposing abstracted and easy-to-use layers that accelerate the overall development process compared to the actual complexity of heterogeneous systems is a major expected benefit from TANGO.
Bruno Wéry, Sébastien Magdelyns - DELTATEC