Skip to content

Structural (filter) pruning for convolutional layers #732

Open
@marcelroed

Description

@marcelroed

System information

  • TensorFlow version (you are using): 2.5.0
  • Are you willing to contribute it (Yes/No): Yes

Motivation

Deciding on where to have high filter/channel counts in convnets can be difficult, and smarter reductions in these numbers can lead to faster inference time across all devices.

Pruning is currently not very useful on GPU, since sparse operations are much slower than dense operations, so it would be useful to have a method of pruning that results in a reduced dense representation.

The current implementation I have (that isn't finished) doesn't require many additional components, since it works similarly to block sparsity and can reuse much of this code.

Describe the feature
Add an option to prune_low_magnitude for "filter pruning" (alternatively "structural pruning") that restricts pruning of supported layers to blocks of the weights at a time. For convolutional layers these blocks represent the output channels of the layer.

In addition, an option is added to strip_pruning to restructure the layers that have been pruned in this manner, with fewer output channels than the original layers. The change in shape needs to be propagated forwards to future layers.

Describe how the feature helps achieve the use case
With these two additions, models can be pruned in a way that is meaningful when running on GPU, saving memory and compute. It is also possible to find a reasonable layout for the number of output channels in each layer without hyperparameter tuning.

This feature makes pruning useful on GPU, where it currently is not so useful.

Describe how existing APIs don't satisfy your use case
Using tfmot.python.core.sparsity.keras.prune.prune_low_magnitude on a convolutional layer will consider each element of the weights variable on its own, and very rarely leads to pruning that can be useful for reducing inference time on the GPU.

In addition, tfmot.python.core.sparsity.keras.prune.strip_pruning will always leave weights with zeros in them, even if a reduction in the size of the layer would be beneficial. If the outputs of a filter in the kernel of a convolutional layer are all zero, strip_pruning will leave restructuring as a step for the runtime.

Activity

teijeong

teijeong commented on Jun 15, 2021

@teijeong
Contributor

Thanks for your interest in contribution!

Please read contribution instructions to take further steps. As this looks like a whole new feature, you also might want to file an RFC

marcelroed

marcelroed commented on Jul 19, 2021

@marcelroed
Author

Thanks for your interest in contribution!

Please read contribution instructions to take further steps. As this looks like a whole new feature, you also might want to file an RFC

Okay, I'm currently creating an RFC and finishing up my proposal. Can I use you or @Xhark as sponsor for the RFC?

yongyongdown

yongyongdown commented on Dec 4, 2021

@yongyongdown

Hello
I want structured pruning. But currently tfmot.python.core.sparsity.keras.prune.prune_low_magnitude seems to be using the unstructured pruning method.
When is structured pruning applied?

Assia17

Assia17 commented on Jun 2, 2022

@Assia17

Hello, any updates on the topic ?
Thank you

fPecc

fPecc commented on Jun 28, 2022

@fPecc

Hello, any updates on this topic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @Xhark@teijeong@liyunlu0618@Assia17@marcelroed

      Issue actions

        Structural (filter) pruning for convolutional layers · Issue #732 · tensorflow/model-optimization