2024-09-09
Model Training Optimizations
Optimize model training using OpenMP and PyTorch
OpenMP is a specialized library for multithreading and is used to bring better performance for parallel computation tasks
PyTorch has OpenMP as a default backend for parallel work but you can still make some changes to extract more performance
run lscpu
to get info about available physical and logical cores
run torch.__config__.parallel_info()
to get info about number of threads that can be used or total number of cores available
lets see why we need it in the first place: threads vs process threads communicate easily (only reads/write data to memory) with each other whereas processes has to exchange queues, sockets, messages
so to manage these threads easily we need OpenMP framework
how to use OpenMP with PyTorch to speed up training
OpenMP by default uses physical and logical cores both to allocate threads but we can set some variables to control this allocation to only use physical cores as they are fast
OMP_PROC_BIND
- prevent threads from moving between cores. set to TRUE.
OMP_SCHEDULE
- binds threads to cores. set to STATIC.
GOMP_CPU_AFFINITY
- binds threads to specific cores. e.g. 0-15
to optimize the intel CPU you can use IPEX which is built natively for PyTorch
performance tuning guide by PyTorch