DevSummit SE Asia 2022

Using HBM2-enabled FPGAs for 2D FFT Acceleration

FPGAs combine HBM2 memory and reconfigurable pipeline logic to efficiently perform memory access patterns that other parallel architectures might struggle with. This is demonstrated here using a 2D FFT which requires a matrix transpose. We can overlap the transpose operation with computation, with minimal buffering, making it almost free from a throughput perspective. Heavily optimized FPGA FFTs can significantly benefit from reduced precision, but for easier comparison with other technologies we are using floating point for this example.

BittWare previously created a 2D FFT kernel for FPGAs using Intel’s OpenCL compiler. We have now rewritten that code for the same 520N-MX card to leverage Intel’s oneAPI programming model, specifically its SYCL programming language. We achieved similar performance to the OpenCL version.

The peak HBM2 performance (Stratix 10 MX) for a batch 1 implementation, with two independent 2D FFT kernels in the same device, is 291 GBytes/Sec. When pipelining/batching a peak bandwidth of 337 GBytes/Sec is possible.

The key benefit we find for using high level tools is the significant reduction in development time.

Speaker(s)

Richard Chamberlain - Principal Systems Engineer, BittWare

Richard Chamberlain - Principal Systems Engineer, BittWare

Richard started his career at MBDA UK, before joining Nallatech in 2001. For last 20 years he has pioneered using FPGAs for HPC and is a trusted industry expert in the field of heterogenous acceleration. Richard currently works as a Principal Systems Engineer in the applications team at BittWare, part of the Molex group.

Learn about joining the UXL Foundation:

Join now