Leverage Intel Deep Learning Optimizations in TensorFlow

TensorFlow is a widely-used deep learning (DL) framework. Intel has been collaborating with Google to optimize its performance on Intel Xeon processor-based platforms using Intel oneAPI Deep Neural Network (oneDNN), an open-source, cross-platform performance library for DL applications. TensorFlow optimizations are enabled via oneDNN to accelerate key performance-intensive operations such as convolution, matrix multiplication, and batch normalization.

We are happy to announce that the oneDNN optimizations are now available in the official TensorFlow release, enabling developers to seamlessly benefit from the Intel optimizations. Additional TensorFlow-based applications, including TensorFlow Extended, TensorFlow Hub, and TensorFlow Serving will also include the oneDNN optimizations.

Enabling oneDNN Optimizations in TensorFlow 2.5

  1. Install the latest TensorFlow pip package, i.e.: pip install tensorflow
  2. By default, the oneDNN optimizations will be turned off. To enable them, set the environment variable TF_ENABLE_ONEDNN_OPTS. On Linux systems, for example: export TF_ENABLE_ONEDNN_OPTS=1
  3. Run your TensorFlow application.

Performance Benefits of TensorFlow 2.5 with oneDNN Optimizations

We benchmarked several popular TensorFlow models on DL inference and training, comparing results with oneDNN optimizations enabled on a 2nd Generation Intel Xeon Scalar Processor platform.

Inference was benchmarked using four cores on a single-socket for latency measurements with all 28 cores for throughput tests. Figures 1 and 2 show the relative performance improvement for inference across a range of the models. For offline throughput measurements (using large batches), performance improvements up to 3x are possible (Figure 1). For real-time server inference (batch size = 1), the oneDNN-enabled TensorFlow took 29% to 77% the time of the unoptimized version for 10 out of 11 models (Figure 2).

Figure 1. Inference throughput improvements
Figure 2. Inference latency improvements

For training, up to 2.4x performance gains were observed across several popular models (Figure 3). Gains were also observed with previous TensorFlow 1.x graph models and the newer TensorFlow 2.x eager execution-based models.

Figure 3. Training performance improvements

You can reproduce these benchmarks by getting the same models from Model Zoo for Intel Architecture:

git clone https://github.com/IntelAI/models.git

The README.md files contain instructions to perform model training and inference. For example, the instructions for the inceptionv3 model are in models/benchmarks/image_recognition/tensorflow/inceptionv3/README.md.

Based on the benchmarking and results above, we encourage data scientists and developers to download the latest official TensorFlow release and enable the oneDNN optimizations to get immediate performance improvements on Intel Xeon processor-based platforms.

Low Precision Data Type

oneDNN also enables the int8 and bfloat16 data types to improve compute-intensive training and inference performance on the latest 2nd- and 3rd-Generation Intel Xeon Scalable processors. These optimizations can improve model execution time by up to 4x for int8 and 2x for bfloat16. The int8 data type is not currently supported in the official TensorFlow 2.5 release, but this limitation will be addressed in a later version. In the meantime, to use the int8 data type, you can download the Intel Optimization for TensorFlow.

TensorFlow Resources and Support

TensorFlow 2.5 can be found here:

For help with technical questions, visit the following communities and forums to find answers and get support:

Benchmarking System Configuration

Two-socket Intel Xeon Platinum 8280L Processor, 28 cores, HT On, Turbo On, total memory 256GB.

System BIOS: SE5C620.86B.02.01.0012.070720200218.

TensorFlow version: https://github.com/tensorflow/tensorflow.git (2.5RC3)

Compiler and libraries: gcc 7.5.0, oneDNN v2.2.0

Datatype: FP32

Data collection date: May 9, 2021


Mahmoud Abuzaina, Deep Learning Software Engineer; Ramesh AG, Principal Engineer; Jason Chow, Marketing Manager; Xiaoming Cui, Deep Learning Software Engineer; Rama Ketineni, Deep Learning Software Engineer; and Guozhong Zhuang, Deep Learning Software Engineer; Intel Corporation.

Notices and Disclaimers

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available options. Learn more at www.Intel.com/PerformanceIndex​.

Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.


Learn about joining the UXL Foundation:

Join now