Fujitsu and RIKEN Optimized oneDNN for Improved Performance on ARM

Claiming the title of world’s fastest performance for the number of deep learning models trained per time unit for CosmoFlow, Fujitsu and RIKEN take first place for MLPerf HPC Benchmark with supercomputer Fugaku.  By applying technology to programs used on the system that reduce the mutual interference of communication between CPUs, they were able to train the system at a rate of 1.29 deep learning models per minute – approximately 1.77 times faster than other systems.  In developing TensorFlow and Mesh TensorFlow implementations for the cosmological parameter prediction benchmark, they customized TensorFlow and optimized oneDNN as the backend. oneDNN specifically uses JIT assembler Xbyak_aarch64 to exploit the performance of A64FX.


