Many workloads have inherent data parallelism which can be leveraged to achieve optimal performance. However, it is challenging to design data parallel programs and map them to different hardware targets. Intel’s Data Parallel C++ is an open alternative for cross-architecture development, aiming to address this challenge. In this talk, we cover a popular distributed programming model MapReduce and how to adopt the model in processing a large dataset on FPGA. We first introduce MapReduce, a widely used programming model for processing large datasets in a distributed fashion. We present the word-count problem to explain how to design data parallel algorithm with MapReduce paradigm. We describe a variant of MapReduce for data parallel programming on FPGA due to the unique characteristics of FPGA. Finally, we demonstrate the design flow of DPC++ programming on Intel DevCloud and the relevant tools for performance analysis and optimization.