Porting oneAPI DPC++ on Xilinx FPGA & Versal ACAP CGRA

Many accelerators comes with programming environment suitable for electrical engineers or usable with machine-learning frameworks but remain difficult to use in an HPC context. Fortunately SYCL 2020 can bring direct programming for various accelerators through the concept of generic back-ends. We are porting the open-source oneAPI DPC++ implementation to Xilinx Alveo FPGA cards and also targeting our Versal AIE CGRA with 400 VLIW vector processors. We extend SYCL with collaborative operations to use the distributed memory shared by the 2D processor neighborhood, useful for stencil code. 

