`normalize_total` with numba #3571

Intron7 · 2025-04-08T15:24:20Z

This also removes the deprecated functions out

selmanozleyen · 2025-04-09T12:19:12Z

src/scanpy/preprocessing/_normalization.py

+    for i in numba.prange(rows):
+        count = counts_per_cell[i] / target_sum
+        for j in range(indptr[i], indptr[i + 1]):
+            data[j] /= count
+    return counts_per_cell, target_sum, counts_per_cols


so this part can be in fast-array-utils under the name elem_mult @flying-sheep right? This is what I understood from our discussions. Should we first implement elem_mult then come back to this issue?

Generally I wouldn’t stop scanpy improvements for fast_array_utils.

But if part of this can be replaced with a mult helper, we could do that now!

codecov · 2025-04-15T13:45:33Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
1962	1	1961	93

View the top 1 failed test(s) by shortest run time

tests/test_scrublet.py::test_scrublet_data

Stack Traces | 0.274s run time

cache = Cache()

    def test_scrublet_data(cache: pytest.Cache):
        """Test that Scrublet processing is arranged correctly.
    
        Check that simulations run on raw data.
        """
        random_state = 1234
    
        # Run Scrublet and let the main function run simulations
        adata_scrublet_auto_sim = sc.pp.scrublet(
            pbmc200(),
            use_approx_neighbors=False,
            copy=True,
            random_state=random_state,
        )
    
        # Now make our own simulated data so we can check the result from function
        # is the same, and by inference that the processing steps have not been
        # broken
    
        # Replicate the preprocessing steps used by the main function
        adata_obs = _preprocess_for_scrublet(pbmc200())
        # Simulate doublets using the same parents
        adata_sim = _create_sim_from_parents(
            adata_obs, adata_scrublet_auto_sim.uns["scrublet"]["doublet_parents"]
        )
    
        # Apply the same post-normalisation the Scrublet function would
        sc.pp.normalize_total(adata_obs, target_sum=1e6)
        sc.pp.normalize_total(adata_sim, target_sum=1e6)
    
        adata_scrublet_manual_sim = sc.pp.scrublet(
            adata_obs,
            adata_sim=adata_sim,
            use_approx_neighbors=False,
            copy=True,
            random_state=random_state,
        )
    
        try:
            # Require that the doublet scores are the same whether simulation is via
            # the main function or manually provided
>           assert_allclose(
                adata_scrublet_manual_sim.obs["doublet_score"],
                adata_scrublet_auto_sim.obs["doublet_score"],
                atol=1e-15,
                rtol=1e-15,
            )
#x1B[1m#x1B[31mE           AssertionError: #x1B[0m
#x1B[1m#x1B[31mE           Not equal to tolerance rtol=1e-15, atol=1e-15#x1B[0m
#x1B[1m#x1B[31mE           #x1B[0m
#x1B[1m#x1B[31mE           Mismatched elements: 2 / 200 (1%)#x1B[0m
#x1B[1m#x1B[31mE           Max absolute difference among violations: 0.0097156451#x1B[0m
#x1B[1m#x1B[31mE           Max relative difference among violations: 0.2066193853#x1B[0m
#x1B[1m#x1B[31mE            ACTUAL: array([0.033079, 0.039326, 0.047022, 0.069388, 0.047022, 0.033079,#x1B[0m
#x1B[1m#x1B[31mE                  0.069388, 0.019841, 0.069388, 0.069388, 0.069388, 0.039326,#x1B[0m
#x1B[1m#x1B[31mE                  0.033079, 0.149254, 0.047022, 0.056738, 0.047022, 0.023555,...#x1B[0m
#x1B[1m#x1B[31mE            DESIRED: array([0.033079, 0.039326, 0.047022, 0.069388, 0.047022, 0.033079,#x1B[0m
#x1B[1m#x1B[31mE                  0.069388, 0.019841, 0.069388, 0.069388, 0.069388, 0.039326,#x1B[0m
#x1B[1m#x1B[31mE                  0.033079, 0.149254, 0.047022, 0.056738, 0.047022, 0.023555,...#x1B[0m

#x1B[1m#x1B[31mtests/test_scrublet.py#x1B[0m:161: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

Co-authored-by: Philipp A. <flying-sheep@web.de>

flying-sheep · 2025-04-15T15:47:42Z

@Intron7 you have to write Fixes #3135 to link the issue.

flying-sheep · 2025-04-17T12:45:18Z

Also this needs a release note!

Intron7 · 2025-04-17T17:35:01Z

@flying-sheep of course

update normalize_total & remove dep

9e68906

Intron7 requested review from flying-sheep and selmanozleyen April 8, 2025 15:24

Intron7 changed the title ~~normalize_total for numba~~ normalize_total with numba Apr 8, 2025

selmanozleyen reviewed Apr 9, 2025

View reviewed changes

This was referenced Apr 14, 2025

Update/normalize_total_rebased #3589

Closed

refactor: normalize_total with Numba #3593

Merged

Merge branch 'main' into normalize_total_update

26c57d6

refactor: normalize_total with Numba (#3593)

b27d218

Co-authored-by: Philipp A. <flying-sheep@web.de>

flying-sheep added this to the 1.11.2 milestone Apr 15, 2025

fix doctest

10ab5e7

flying-sheep linked an issue Apr 17, 2025 that may be closed by this pull request

normalize_total with numba #3135

Open

flying-sheep enabled auto-merge (squash) April 17, 2025 12:43

flying-sheep disabled auto-merge April 17, 2025 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`normalize_total` with numba #3571

`normalize_total` with numba #3571

Intron7 commented Apr 8, 2025 •

edited

Loading

selmanozleyen Apr 9, 2025

flying-sheep Apr 11, 2025

codecov bot commented Apr 15, 2025 •

edited

Loading

flying-sheep commented Apr 15, 2025

flying-sheep commented Apr 17, 2025

Intron7 commented Apr 17, 2025

normalize_total with numba #3571

Are you sure you want to change the base?

normalize_total with numba #3571

Conversation

Intron7 commented Apr 8, 2025 • edited Loading

selmanozleyen Apr 9, 2025

Choose a reason for hiding this comment

flying-sheep Apr 11, 2025

Choose a reason for hiding this comment

codecov bot commented Apr 15, 2025 • edited Loading

❌ 1 Tests Failed:

flying-sheep commented Apr 15, 2025

flying-sheep commented Apr 17, 2025

Intron7 commented Apr 17, 2025

`normalize_total` with numba #3571

`normalize_total` with numba #3571

Intron7 commented Apr 8, 2025 •

edited

Loading

codecov bot commented Apr 15, 2025 •

edited

Loading