2024-09-15

Simplify neural networks with pruning and compression

Pruning and compression using neural network intelligence (nni) by Microsoft

Pruning: cut off params from the weights, bias, kernel values by changing values to null and leave out these connections from the network. however the data structure remains as it is. so memory and compute is the same.

Compression: takes the pruned model and removes the pruned params.

Lets use nni (neural network intelligence) for these steps.

nni has modules called pruning and speedup.

the pruner takes a config list (dict) with options to include modules to compress like layers, sparse ratio of the pruning targets.

to prune the params we'd to use compress method that returns a data structure called as masks which has info about pruned params.

now to remove these pruned params we need to use a speedup module.

we need to provide model, samples and masks.

this will make the model real smaller and execute faster.

here's a quickstart guide - https://lnkd.in/gZPqi658