Standard Driven Heterogeneous Programming with oneAPI

Getting the maximum achievable performance out of today’s hardware requires developers to delicately balance the unlocking of their hardware’s full potential and using code that is portable and power-efficient. Developers require an environment that is open and unified to allow them to program across various architectures without needing to know the details about all of them. Heterogeneity brings better performance and efficiency but it also brings increased complexity. To minimize that complexity oneAPI simplified programming across CPU, GPU, FPGA, and other accelerators.

oneAPI is an open, standards-based, cross-architecture programming model to simplify development for a wide range of data-centric workloads across a variety of architectures.

oneAPI consists of a language and libraries for creating parallel applications, including oneAPI’s Deep Learning Neural Network (oneDNN), which offers high performance implementations of primitives for deep learning frameworks.

oneAPI for Deep Learning

One developer making full use of oneAPI is Sheng Zha, Senior Applied Scientist at Amazon, who works with Apache MXNet PPMC. While oneAPI provides highly optimized and performant cross-architecture programming, Sheng focuses his use of oneAPI as a deep learning framework because MXNet integrates seamlessly with oneDNN. oneDNN provides efficient kernel implementation for commonly used kernels in deep learning workloads. It also provides kernel fusion and quantization functionality, and the combination of both improves efficiency of common computer vision workloads by 4x to 16x. This improved efficiency helps MXNet achieve state-of-the-art performance on Intel CPUs and serves a lot of latency critical workloads.

Looking toward the future, Sheng envisions that deep learning workloads will need more efficiency for dynamic network structures, such as Natural Language Processing (NLP). In particular, using reinforcement learning in NLP with variable input and output links. Sheng also sees a need for control flow in dynamic graph structures in graphic neural networks. As accelerator hardware improves and offers better ways to handle dynamic structures, oneAPI’s unified approach and direct programming capability should provide a unique opportunity to make those hardware improvements accessible to developers.

Supporting the Future of Coding

The overall support for heterogeneous code development continues to surge. Ronan Keryell, the principal software engineer at Xilinx Research Labs works on the SYCL C++-based programming model for heterogeneous systems like FPGA and CGRA, which is a foundational element of oneAPI. SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. Recently, Ronan and team introduced SYCL 2020, which integrates more than 40 new features including updates for streamlined coding and smaller code size. The most important feature of SYCL 2020 is that it has an expanded interoperability that enables efficient acceleration by diverse backend acceleration APIs, allowing users to target a diverse set of hardware. This expansion allows developers to build code for today and tomorrow, accelerating adoption and deployment of their High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications across multiple platforms.

Fulfilling the Need for Heterogeneous Programming

Someone who saw the potential of heterogeneous programming early on was Andrew Richards, CEO and Co-Founder of Codeplay. Technology is advancing every day and benchmarks continue to rise for supercomputers that are handling increasingly complex code. As the demand for unified programming continues to grow, Richards notes that it’s supercomputer labs that are really driving the need. With ambitious customers, Codeplay continues to support oneAPI and SYCL to help users create code they can write once and have perform across a wide range of hardware.

One trend that Richards has seen is a need to not only design chips and programming models for those chips, but also there’s a huge need to design efficient chips for real-world software. Supercomputer labs have become a huge driving force behind the need to future-proof code, and quickly and efficiently optimize across diverse architectures. The beauty of oneAPI and SYCL is that they save developers time and effort, especially when it comes to more complex coding. Developers no longer need to rewrite code from scratch for each individual accelerator. oneAPI allows these users to tune their code and enable those optimizations to get the best performance from their application across all of those different architectures. 

Expanding the Reach of Standards-based, Unified Programming

As the community around oneAPI grows, further innovation continues to propel heterogeneous programming into the future. Aksel Alpay, a researcher and software engineer from Heidelberg University, works on high performance computing topics. In particular, he is the creator and lead developer of the hipSYCL SYCL implementation, and also engages within the Khronos SYCL working group to advance the language. 

Currently, HipSYCL runs on AMD GPUs, Video GPUs, and almost every CPU. They’ve recently added experimental support for Intel GPUs that uses a level zero switch, which is also used by the DPD++ compiler. Aksel has said that adding this support was a great experience for the team because there was an open specification and they were able to reuse certain components from the DPC++ compiler thanks to its open source nature. Now, they can deliver a complete package of SYCL implementation that runs on pretty much every GPU.

Another project Aksel’s team is currently working on involves the oneAPi libraries. They believe having the language is only the first step and you need to have a library ecosystem that is portable. His team is working on getting support for HipSYCL in the oneAPI libraries, which would then bring support for AMD hardware to those libraries as well. This means developers could, for example, use these libraries on Frontier or El Capitan supercomputers.

There is a lot more exciting innovation on the horizon as developers continue to break down code barriers and simplify programming across diverse compute architectures. What does the future hold for heterogeneous programming? Only time will tell as we are really only at the beginning of this era and the future of computing will require heterogeneous hardware to maximize computing power.


Learn about joining the UXL Foundation:

Join now