Open
Description
Description
We could add a method to the Ops, similar to infer_shape that infers the number of flops and memory usage of an Op given its inputs and input shapes.
This could be useful for some meta optimization that tries different rewrite orderings or subsets (or non eager rewrites) to arrive at a more compact graph.
For instance a simple Elemwise addition would have output_size flops and memory, but once inplace it would have 0 memory cost.
Dot instead of sum of mul would have slightly less flops (due to fused multiply add) and much smaller memory cost