2024-09-15
Simplify neural networks with pruning and compression
Pruning and compression using neural network intelligence (nni) by Microsoft
Pruning: cut off params from the weights, bias, kernel values by changing values to null and leave out these connections from the network. however the data structure remains as it is. so memory and compute is the same.
Compression: takes the pruned model and removes the pruned params.
Lets use nni (neural network intelligence) for these steps.
nni has modules called pruning and speedup.
the pruner takes a config list (dict) with options to include modules to compress like layers, sparse ratio of the pruning targets.
to prune the params we'd to use compress
method that returns a data structure called as masks
which has info about pruned params.
now to remove these pruned params we need to use a speedup
module.
we need to provide model, samples and masks.
this will make the model real smaller and execute faster.
here's a quickstart guide - https://lnkd.in/gZPqi658