2024-09-05

Introduction to torch.compile

A super amazing intro to torch.compile API

pytorch has two execution modes:

eager mode - each op is executed instantly as it appears in the code (simple to understand and hack).
graph mode - evaluates all the ops to see if there's any chance of optimization (generally performs better but its complicated and takes time to compile code).

by default execution mode of pytorch is eager mode but pytorch 2.0 natively supports graph mode using the compile API.

to see if you've compiled the code successfully you can either use pytorch profiler or set the os.environ["TORCH_COMPILE_DEBUG"] = "1".

once you have a model just do:

torch.compile(resnet, mode="reduce-overhead")

there're three compiling modes in the API:

default - balances between compiling time and model performance. works better in most of the cases.
reduce-overhead - reduces the overhead of loading batches to memory and is used for small batches.
max-autotune - most optimized code but takes a lot of time to optimize the code.

there's a catch, as per the pytorch docs, compile works best with GPUs of compute capability > 0.8 and on much more complex and deep architectures with high number of params.

this is what torch.compile is and what it does. we will see what's happens under the hood later.

refer to this tutorial for a nice intro to torch.compile.