Approach for Optimizing Heterogeneity

“In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done.
The free lunch is over. Now welcome to the hardware jungle.”

Herb Sutter’s outlook (2014)

Heterogeneous parallel architectures have received considerable attention, as an efficient approach to run applications and deliver services, by combining different processor types in one system to improve absolute performance, minimise power consumption and/or lower cost. We are seeing the introduction of new platforms incorporating multi-core CPUs, many-core GPUs, and a range of additional devices into a single solution. These platforms are showing up in a wide range of environments spanning supercomputers to personal smartphones. As the range of applications continues to grow, e.g. Cyber Physical Systems (CPS), Internet of Things (IoT), connected smart objects, High Performance Computing (HPC), mobile computing, wearable computing etc. there is an urgent need to design more flexible software abstractions and improved system architectures to fully exploit the benefits of these heterogeneous platforms.

Heterogeneous parallel architectures have the potential to be applied to any sizable workload. Adopting heterogeneous systems to run HPC as well as non-HPC workloads has the potential to deliver higher performance on extreme-scale applications, which is particularly useful when homogeneous servers are too slow. As the HPC community is heading toward the era of exascale machines, these are expected to exhibit an unprecedented level of complexity and size. The biggest challenges to future application performance lie with not only efficient node-level execution but power consumption as well.
Although general complex engineering simulations come to mind when identifying families of applications benefiting most from customised low power computing heterogeneous architectures, in the upcoming era of IoT and Big Data, new families of applications will soon show significant interest in exploiting the capabilities offered by customised heterogeneous hardware such as FPGA, ASIP, MPSoC, heterogeneous CPU+GPU chips and heterogeneous multi-processor clusters all of which with various memory hierarchies, size and access performance properties. In fact, Big Data online with nearly instantaneous results will demand massive parallelism and well devised divide-and-conquer approaches to exploit heterogeneous hardware, both client and server sides, to its fullest extent.

One of the major challenges to exploiting the benefits of such heterogeneous architectures, in data centric and emerging domains, is the complexity of using them – or more precisely of designing and maintaining software that can deliver such benefits. Developers need to fully understand the nuances of different hardware configurations and software systems (both rapidly evolving), as well as consider additional difficulties in performance, security mixed-criticality and power consumption resulting from the heterogeneous system.
The most important step in software design for low power, above all other power optimizations, is software correctly fitted to the capabilities of the underlying hardware. The importance of exploiting parallelism is of increasing significance as parallelization has become the dominant method of delivering higher performance and improved energy efficiency. A key element of the value chain is in software and programming methodologies.

However, traditional programming approaches for parallel algorithms, programming environments and tools are designed for legacy homogeneous multiprocessors. They will at best achieve a small fraction of the efficiency and the potential performance that we should expect from parallel computing in tomorrow's computing systems which are: 1) Highly diversified; 2) Operate in mixed environments, and 3) Based on heterogeneous parallel architectures.

As heterogeneity is emerging as one of the most profound and challenging characteristics of these parallel environments, two levels are identified: 1) Macro level: networks of distributed computers (clouds, Grids, clusters), composed by diverse node architectures (single, multi-core), are interconnected with potentially heterogeneous networks, and 2) Micro level: deeper memory hierarchies (main, cache, disk storage, tertiary storage) and various accelerator architectures (fixed, programmable, e.g. GPUs, and reconfigurable, e.g. FPGAs). Advances in system integration approaches include Heterogeneous System Architectures (HSAs) that combine CPUs and GPUs onto the same piece of silicon (often called accelerated processing units, or APUs). This reduces typical energy consumption by eliminating power-robbing interfaces between chips and enables on-chip management tools to efficiently allocate and reduce power between integrated components.

Because the impact of heterogeneity on all computing tasks is rapidly increasing, innovative architectures, algorithms, and specialized programming environments and tools are needed to efficiently use these new and mixed/diversified parallel architectures. As the market for heterogeneous architectures/multicore processors in embedded applications has begun to move into the product deployment stage, the need for software and the underlying programming methodologies is also increasing in parallel.

The approach is then to complement low-power multi/many-core computing systems developments by addressing the power consumption and efficiency of the software which runs on these infrastructures. The major consumption of energy by software is the power consumed in its operation. The primary aim is thus to relate software design and power consumption awareness, making it imperative that the software to be developed is not only as low power consumption aware as it possibly can be, but takes into account trade-offs with other key requirements in the environment where it runs such as performance, time-criticality, dependability, data movement, security and cost-effectiveness as well.

Understanding the factors which affect power consumption in software development and operation stages for heterogeneous parallel environments is important. The way forward is the combination of the principles of requirements engineering and design modelling for self-adaptive software systems and power consumption awareness related to these environments. The energy efficiency and application quality factors should be integrated in the application life-cycle (design, implementation, operation). A key element is a novel reference architecture and its implementation to support such requirements. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips (multi-processing system-in-chip, Application Specific Instruction-set on Chip) and programmable logic devices (ASIP, FPGA) is another clear requirement.
In summary, the exploitation of heterogeneous parallel architectures should be facilitated by providing a complete methodology that enables software designers to easily implement and verify applications running on such platforms including general-purpose processors and acceleration modules implemented in the latest reconfigurable technology. The methodology will consider low-power consumption as a key factor for applications as well as other requirements such as for performance, dependability, security, or other qualities of service.

For current research on this topic see:
Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO),