Using HBM2-enabled FPGAs for 2D FFT Acceleration

FPGAs combine HBM2 memory and reconfigurable pipeline logic to efficiently perform memory access patterns that other parallel architectures might struggle with. This is demonstrated here using a 2D FFT which requires a matrix transpose. We can overlap the transpose operation with computation, with minimal buffering, making it almost free from a throughput perspective. Heavily optimized FPGA FFTs can significantly benefit from reduced precision, but for easier comparison with other technologies we are using floating point for this example.

BittWare previously created a 2D FFT kernel for FPGAs using Intel’s OpenCL compiler. We have now rewritten that code for the same 520N-MX card to leverage Intel’s oneAPI programming model, specifically its SYCL programming language. We achieved similar performance to the OpenCL version.

The peak HBM2 performance  (Stratix 10 MX) for a batch 1 implementation, with two independent 2D FFT kernels in the same device, is 291 GBytes/Sec. When pipelining/batching a peak bandwidth of 337 GBytes/Sec is possible.

The key benefit we find for using high level tools is the significant reduction in development time.


The oneAPI Specification joins the UXL Foundation. For more info:

Press Release