Skip to content

Perf: Reduce StridedMemoryView construction time #449

Open
@leofang

Description

@leofang

Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object:

In [4]: x = cp.empty((23, 4))

In [7]: %timeit s = StridedMemoryView(x, -1)
3.4 μs ± 8.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [8]: %timeit s = StridedMemoryView(x, 1)
3.45 μs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

which could be a bit expensive in a tight loop. We should try to reduce it down to 1 us or O(100) ns if possible.

cc @shwina for vis

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Medium priority - Should docuda.coreEverything related to the cuda.core moduleenhancementAny code-related improvementstriageNeeds the team's attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions