Back to Blog

Fujitsu and RIKEN Optimized oneDNN for Improved Performance on ARM

November 29, 2021

Claiming the title of world’s fastest performance for the number of deep learning models trained per time unit for CosmoFlow, Fujitsu and RIKEN take first place for MLPerf HPC Benchmark with supercomputer Fugaku. By applying technology to programs used on the system that reduce the mutual interference of communication between CPUs, they were able to train the system at a rate of 1.29 deep learning models per minute – approximately 1.77 times faster than other systems. In developing TensorFlow and Mesh TensorFlow implementations for the cosmological parameter prediction benchmark, they customized TensorFlow and optimized oneDNN as the backend. oneDNN specifically uses JIT assembler Xbyak_aarch64 to exploit the performance of A64FX.

Read Full Press Release: Fujitsu and RIKEN Claim 1^st Place for MLPerf HPC Benchmark with Supercomputer Fugaku

Read the Developer: Story How we ported oneDNN to Fugaku with ARM

Learn more about TensorFlow and oneDNN in Partnership

< Previous Post Next Post >

Join us at the oneAPI DevSummit Hosted by UXL FoundationSeptember 17, 2025

Join us at the oneAPI DevSummit Hosted by UXL Foundation
September 17, 2025