Efficient Inference and Training of Large Neural Network Models

The memory consumption and computational cost of SOTA NN models are dramatically increasing. Efficient deep learning can be applied to both inference and training. In this talk, we present our progress regarding quantization and sparsification. First, we show our solutions to efficiently implementing large recommendation models, where we systematically apply quantization in DQRM with the assistance of oneAPI. Then we talk about our efforts in compressing large language models (LLMs) and diffusion models. Finally, we introduce TASC, which is designed to accelerate distributed training. Our methods achieve excellent performance and obtain decent generalization ability.

Download Presentation


Learn about joining the UXL Foundation:

Join now