MLM Fine-Tuning & Optimization with oneAPI and 4th Generation Intel® Xeon® Scalable Processor

In this talk, we present our experiences about using oneAPI with Intel® 4th Gen Xeon Sapphire Rapids processors to showcase how we saw almost 50% reduction in training time for fine-tuning an MLM model on domain data. We were able to train an MLM model with finance domain data containing ~1.1 M paragraphs, on Intel® 4th Gen Xeon Sapphire Rapids (with resource limitation of 32 CPU and 64 GB), in ~80 hours, and oneAPI enabled optimizations which are within acceptable operation range considering the data size.

This gives a good alternative option with Intel® Xeons and oneAPI toolkit for training LLM . We optimized the trainings using oneAPI with below techniques:

  • Using oneAPI based IPEX (Intel® extensions for PyTorch)
  • Using BF16 datatype (Using 16-bit precision training instead of 32-bit training)

Learn about joining the UXL Foundation:

Join now