2024-09-09

Model Training Optimizations

Optimize model training using OpenMP and PyTorch

OpenMP is a specialized library for multithreading and is used to bring better performance for parallel computation tasks

PyTorch has OpenMP as a default backend for parallel work but you can still make some changes to extract more performance

run lscpu to get info about available physical and logical cores

run torch.__config__.parallel_info() to get info about number of threads that can be used or total number of cores available

lets see why we need it in the first place: threads vs process threads communicate easily (only reads/write data to memory) with each other whereas processes has to exchange queues, sockets, messages

so to manage these threads easily we need OpenMP framework

how to use OpenMP with PyTorch to speed up training

OpenMP by default uses physical and logical cores both to allocate threads but we can set some variables to control this allocation to only use physical cores as they are fast

OMP_PROC_BIND - prevent threads from moving between cores. set to TRUE.

OMP_SCHEDULE - binds threads to cores. set to STATIC.

GOMP_CPU_AFFINITY - binds threads to specific cores. e.g. 0-15

to optimize the intel CPU you can use IPEX which is built natively for PyTorch

performance tuning guide by PyTorch