TANGO Development-time tooling: Leveraging a Framework from Operations Research for optimal software placement and task scheduling on heterogeneous hardware platforms

In the emerging era of Internet of Things and Big Data processing, the paradigm of software development is shifting. In particular, many IoT and Big Data problems can be solved using vector and matrix operations and these operations can be drastically accelerated by using massively parallel processing units such as GPU and FPGA, which are now integrated in many recent heterogeneous platforms found on the market. Importantly, these heterogeneous platforms from different vendors, different models and model families come not only with different parallelisation capabilities from their processing units but also with memory hierarchies of different sizes with different latency and access time. Therefore contrary to the world of homogeneous computing where developers learnt to abstract away these memory aspects by relying on caching layers provided in computing systems, in the heterogeneous world, the variety of memory-related properties is so diverse that developers cannot fully ignore these properties anymore. If they intend to exploit capabilities of heterogeneous platform to a significant extend, they must understand well the memory hierarchies found in heterogeneous hardware platforms.

In other words, in the last three decades, developers have been asked to design and implement applications where the core of the work was devoted to elaborating efficient algorithms to solve a given problem. The emphasis was put on finding an efficient sequence of operations to solve the problem. Now, developers can in many cases, find many parallelized implementations of different algorithms. Furthermore, applying artificial intelligence classification techniques are also frequently used to derive efficient ways to solve a series of complex IoT and Big Data problems thanks to the acceleration provided by heterogeneous hardware platforms. Thus, effective developers of the post 2020 era must not focus on yet writing another version of existing algorithms but rather on being capable to answer the question: “Which algorithm provide the most suitable computing task break down to be distributed on the given processing elements of an heterogeneous platform taking into account data characteristics of the problem at hand and memory properties (size, latency, and access time) in the hierarchy provided by the given platform?” Developers’ job is no more to design and implement algorithms but instead to focus on data related aspects, notably, properties of the various datasets to process in an application such as size and degree of completeness, and properties of the various memories distributed throughout a heterogeneous massively parallel computing system.

Answering the question above is a complex combinatorial problem where once again computing power and existing software framework from operations research can help. In particular, solvers of combinatorial problems such as OscaR [https://bitbucket.org/oscarlib/oscar/wiki/Home] can facilitate searching the solution space to find the optimal solution among many possible combinations of variables with discreet values. For instance, using the constraint programming solver of OscaR, only a model of the problem space expressed as a set of variables and constrains on these variables must be provided. Developing the search algorithm then consists of mapping these variables to lower level Integer variables, which are built-in primitives used by the constraint programming solvers in its search for an optimal solution on a given objective function. In TANGO, a programme for searching an optimal distribution of software components and scheduling of tasks executed by these components is developed for the OscaR constraint programming solver. Several objectives functions have been investigated. First, the objective to achieve is the lowest energy footprint while maintaining a minimal overall time performance on a heterogeneous massively parallel platform. Moreover, it is also possible to minimise runtime as well as explore a Pareto optimisation space between energy and runtime, leaving the final trade-off decision to the developer.

Relying on an existing solvers of combinatorial problems such as OscaR brings several advantages over developing ad-hoc algorithms specific to an optimisation problem:

First, relying on built-in OscaR primitives makes implementing a search for the optimal solution to combinatorial problems much shorter than coding ad-hoc algorithms using imperative programming languages such as C, C++, or Java. The heavy lifting of applying the search strategy is done by the OscaR constraint programming engine. In TANGO, the size of the implementation based on OscaR, currently contains around 2K lines of code (LoC) including the problem space modelling and the OscaR specific optimisation related code. Each consists of around 1K LoC.

Second, unless extremely well designed, the implementation of ad-hoc algorithms often have a tendency to mix aspects related to problem space modelling with the search algorithms. In other words, variables are embedded in data structures alongside ad-hoc search methods that explore the solution space for the optimal solution. Conversely, using a solver from operations research such as OscaR forces to think on how to model the problem space as a set of variables representing the relevant concepts to consider during optimisation. Subsequently, an OscaR developer must find a way to express optimisation constrains on problem space variables using OscaR primitives understood by the OscaR search engine. Thus, instead of developing algorithms to search for optimal solution, the OscaR developer focuses on modelling constraints using OscaR primitives. Once a developers has learned the OscaR primitives, modelling such constrains can become quite fast. In general much faster than developing an ad-hoc search algorithm. Thus, in addition to the enhanced readability of OscaR code resulting from the clear separation between the problem-space model and model of optimisation criteria, OscaR also makes the trained OscaR developer highly productive.

Third, each new optimisation criteria considered in an ad-hoc approach might require implementing a whole new search method. On the other hand, when using a constraint programming solver such as provided by OscaR only requires finding a way to model these new criteria using OscaR primitives. For instance, in TANGO, implementing a new constraint on power capping of certain hardware elements only took two hours of development. On the other hand, ad-hoc search algorithms must include new types of variables and code a whole new algorithm for optimising on new valid values for these variables.

Fourth, using OscaR simplifies the verification and validation process. First, the separation of concerns between the problem-space model and the optimisation model allows for an efficient validation process. In particular, validation can be achieve by exposing problem-space model directly from the developed OscaR code. For instance, in TANGO, models related to hardware processing, storing and transferring were merely presented to expert developers on heterogeneous platforms directly from the OscaR code and a few presentation slides. Fruitful discussions resulted and provided specific points for elaborating new improved problem space model with a more general and realistic hardware model. Besides, OscaR implements search techniques from published research over a couple of decades and has undergone significant testing not only relying on laboratory test cases but also based on its successful applications on several Industry projects and inclusion in commercial products. Thus, the level of confidence in quality of the resulting solutions and the fact that they are optimal is quite high. In general, such level of confidence can rarely be reached for ad-hoc optimisation search algorithms.

Finally, another important fifth benefit results from combining the advantages mentioned above. Notably, the OscaR-based approach is well suited to iterative and incremental improvements. This benefit could be observed between the first and second annual iteration of TANGO. Initially, the TANGO component performing design and development-time optimisation, named Placer, was implemented to place software components on heterogeneous hardware processing elements in order to minimise energy consumption. In this initial implementation, a significant constraint was considered by assuming usage in a continuous data streaming context. In other words, once placed, all software components were assumed to process data continuously. This restriction was removed in the second iteration. A new problem-space modelling is currently being verified and validated where Placer can now take into account timing and scheduling aspects of tasks, in particular when such data processing is not a continuous data streaming context. Thus, at the end of iteration 2, Placer is still capable of placing optimally software components in continuous dataflow situations but it can also schedule tasks processing where incoming data are one shot process. In this context, Placer optimises the energy consumption while also attempting to reduce the time span for executing all tasks of a software application run. In the next iteration, Placer will further investigate repetitive-task scheduling.

In conclusion, the only inconvenient aspect of developing optimisation solutions to discreet combinatorial problems by relying on an existing operations research framework like OscaR lies in the necessity to learn how to transpose a set of constraints from the problem space into a model using OscaR primitives. Once learnt, a developer will then be able to reuse similar transposition logics for many other combinatorial optimisation problem. From the resulting transposed model, the search algorithm can fully rely on OscaR’s constraint programming solver to find automatically the optimal combination sought. While the learning effort requires a bit of time, it will greatly benefit developers due to the huge increase in productivity when developing optimisation algorithms. Furthermore, the concise code resulting from a constraint programming approach will reduce the amount of bugs, ease following iterative and incremental improvement approaches often used in agile development methods. Also, the separation of concerns encouraged by constraint programming makes the problem-space model clear hence facilitates an iterative validation effort with domain experts. Importantly, a constraint programming approach makes it quick to consider new optimisation criteria. All these advantages have been observed in the context of TANGO, in particular during the first two annual iterations where Placer has been developed, verified and validated iteratively and efficiently.

As argued in the opening of this article, developers’ job is shifting from algorithm development to fast design, prototyping and integration of existing components. Learning framework such as OscaR that facilitate producing optimisation solutions is therefore perfectly in line with the new function demanded to expert developers.