Structural (filter) pruning for convolutional layers

**System information**

- TensorFlow version (you are using): 2.5.0
- Are you willing to contribute it (Yes/No): Yes

**Motivation**

Deciding on where to have high filter/channel counts in convnets can be difficult, and smarter reductions in these numbers can lead to faster inference time across all devices.

Pruning is currently not very useful on GPU, since sparse operations are much slower than dense operations, so it would be useful to have a method of pruning that results in a reduced dense representation.

The current implementation I have (that isn't finished) doesn't require many additional components, since it works similarly to block sparsity and can reuse much of this code.


**Describe the feature**
Add an option to `prune_low_magnitude` for "filter pruning" (alternatively "structural pruning") that restricts pruning of supported layers to blocks of the weights at a time. For convolutional layers these blocks represent the output channels of the layer.

In addition, an option is added to `strip_pruning` to restructure the layers that have been pruned in this manner, with fewer output channels than the original layers. The change in shape needs to be propagated forwards to future layers.


**Describe how the feature helps achieve the use case**
With these two additions, models can be pruned in a way that is meaningful when running on GPU, saving memory and compute. It is also possible to find a reasonable layout for the number of output channels in each layer without hyperparameter tuning.

This feature makes pruning useful on GPU, where it currently is not so useful.


**Describe how existing APIs don't satisfy your use case**
Using `tfmot.python.core.sparsity.keras.prune.prune_low_magnitude` on a convolutional layer will consider each element of the weights variable on its own, and very rarely leads to pruning that can be useful for reducing inference time on the GPU.

In addition, `tfmot.python.core.sparsity.keras.prune.strip_pruning` will always leave weights with zeros in them, even if a reduction in the size of the layer would be beneficial. If the outputs of a filter in the kernel of a convolutional layer are all zero, `strip_pruning` will leave restructuring as a step for the runtime.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structural (filter) pruning for convolutional layers #732

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Structural (filter) pruning for convolutional layers #732

Description

Activity

teijeong commented on Jun 15, 2021

marcelroed commented on Jul 19, 2021

yongyongdown commented on Dec 4, 2021

Assia17 commented on Jun 2, 2022

fPecc commented on Jun 28, 2022

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions