In this talk, we present our experiences about using oneAPI with Intel® 4th Gen Xeon Sapphire Rapids processors to showcase how we saw almost 50% reduction in training time for fine-tuning an MLM model on domain data. We were able to train an MLM model with finance domain data containing ~1.1 M paragraphs, on Intel® 4th Gen Xeon Sapphire Rapids (with resource limitation of 32 CPU and 64 GB), in ~80 hours, and oneAPI enabled optimizations which are within acceptable operating range considering the data size.
This gives a good alternative option with Intel® Xeons and oneAPI toolkit for training LLM . We optimized the trainings using oneAPI with below techniques:
Using oneAPI based IPEX (Intel® extensions for PyTorch)
Using BF16 datatype (Using 16-bit precision training instead of 32-bit training)
OpenMP optimizations like: KMP_BLOCKTIME, KMP_AFFINITY, KMP_SETTINGS, OMP_NUM_THREADS