In the context of development process for a software application targetting heterogenous architectures, Design Time Characterisation is a step that consists of making rapid prototyping and early performance evaluation of selected parts of computations, which helps making better and more objective design choices for the implementation.
This blog post is a quick introduction of Poroto : a DTC (Design Time Characterisation) tool from the TANGO toolchain.
The objective is to emphasize how such tool would be beneficial for the developer and to illustrate the added value he may get from. Poroto is relevant when the target hardware architecture includes a FPGA as potential accelerator for part of the computations composing the application.
The main objective of this DTC tool is to enable, at design time, quick prototyping and performance evaluation of FPGA kernels implementing selected functions or portions from application source code. Mainly, the Poroto tool handles the generation of VHDL code for interfaces, memories and FPGA glue, the provision of test set-ups for the FPGA design and the CPU design, the automation of the compilation process for FPGA and optionality system power measurement controlled from test application. This last feature needs however a specific external hardware device. Here an overview of the tool workflow .
Poroto allows to tune the kernel generation process by allowing the user to experiment with different configurations. The benchmarking of the generated kernels is basically done by comparing the execution time of the function on one hand for CPU only and on the other hand for CPU offloading the computation to the FPGA through PCI-e interface.
For the illustration purpose let's consider a trivial example of an application involving a big number of iterations on a given computation.
The developer aims to evaluate the potential and take benefit from the presence of an FPGA on the target platform by offloading part of the computations on it. Rather than starting a lengthy process of developing a specific FPGA code (either HDL or specific C for HLS) and interfacing it in hardware and software, the developer, who may not be a FPGA guru, basically needs to know quickly at this stage about a first baseline for the performance enhancement he may get from the FPGA. the developer can uses Poroto tool to lightly annotate (poroto annotation) and quickly compile the C code of the function for FPGA execution. This step results in the generation of a kernel that the user can characterize through a CPU testbench that executes the wrapper function for that FPGA kernel. By the way, the user can, at this step, compare the performance of the offloaded function to its initial version on the CPU.
this will show for example an FPGA execution performance "x times" slower or "y times" faster than that of the CPU, depending on various aspects such as the original code structure, the annotations used, the performance of the C to VHDL compiler, etc. However, the developer gets quickly a first baseline, that he may choose to optimise (or not) taking into account the original requirements on the targeted application performance. Hence the developer through that kind of quick evalaution enabled by the Poroto tool, can make design time choices to parallelise execution of his application with a more sensible ratio of computation execution between CPUs and FPGAs composing his target hardware architecture.
By the way combining Poroto with a parallel programming model, such OmpSs, could be a sound approach to support further the user at "Design Time" explorations. This will address number of open questions at design time such as: how to distribute computations over the various CPU cores and the FPGA kernels just generated? How to be sure that the estimated theoretically achievable performance based on atomic measurements apply in the context of the effective hardware architecture he is working on?
A next blog post will focus on these issues.
For more insights on Poroto features, scope and examples, see Poroto official page on TANGO website.