VxWorks with OpenCL Hardware Acceleration on Altera CycloneV SoC
As devices get smaller, lighter, and smarter, some embedded devices need to execute high performance computations at hard real-time speeds. Small light-weight devices that run huge volumes of data over set algorithms include auto-park automobiles sensing the surrounding environment, unmanned aerial drones running flight control systems, robot arms with tight control systems, even mobile devices running search algorithms, calculating mapping information, or crunching financial data.
Wind River has collaborated with Altera to deliver configurable algorithm hardware acceleration alongside the VxWorks real-time operating system. Altera’s ARM-based Cyclone V SoC contains an FPGA that can be configured to run data-processing algorithm across multiple parallel hardware computations. This has the advantage of accelerating the computations to meet real-time requirements, and off-loading the processing from the main processor, which frees the CPU to run other real-time applications.
I tried out the latest VxWorks with Altera CycloneV SoC Development Kit with FPGA support to compare hardware computation versus software computation. Altera provided me with an FPGA configuration set up as an OpenCL engine. For a computation-intensive algorithm, I used the Mandelbrot sequence calculated at various magnifications. The result of the algorithm is displayed via the SDi output of the Altera SoC Development Kit. My experiment calculates the map pixels over an area of 1210 x 860 pixels.
In hardware mode, the calculations are executed in the FPGA, and the results are passed back to the CPU for display. In software mode, the CPU performs the calculations itself.
I can see a striking difference between the two modes of operations. You can see the results in this video.
To make the comparison easier, I added a graphical user interface to easily switch between hardware calculation mode and software calculation mode. VxWorks provides graphical widgets like buttons and scroll bars as part of the user interface offering for devices that require some sort of graphical interaction with humans. You can see the resulting interface in the featured image.
In hardware mode, at 50K magnification, the device can calculate about 3.3 frames per second. In software mode, the calculations drop down to 0.10 frames per second. The hardware acceleration provides 20 – 30 times increase in computation speed. This increase shows the advantage of pushing calculations through the FPGA as parallel hardware processes.