Skip to content

Added comprehensive testing for visualization components #2737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Ya-shh
Copy link

@Ya-shh Ya-shh commented Mar 27, 2025

This PR adds a testing infrastructure for Mesa's visualization components (SolaraViz) to ensure they render correctly and function properly across all example models.

Browser-less Unit Tests (test_visualization_components.py)

  • Tests for both Matplotlib and Altair visualization backends
  • Component-level tests for space and plot visualizations
  • Cross-model tests to verify visualization functionality in all example models
  • Interactive testing of model controllers (step, play/pause, reset)
  • Performance benchmarks to measure rendering efficiency

Browser-based UI Tests (tests/ui/test_browser_viz.py)

  • Integration tests for end-to-end visualization rendering
  • Visual regression testing via screenshot comparisons
  • Tests for parameter controls and interactive features
  • Compatibility with Solara's recommended testing approach

Additional Compatibility Improvements

  • Fixed Agent initialization in visualization tests to follow the correct API
  • Improved Python version compatibility for Python 3.10-3.13
  • Enhanced error handling in Altair visualization components
  • Ensured tests run properly across different environments

CI Integration

  • Added viz-tests.yml for fast browser-less tests on all PRs
  • Added ui-viz-tests.yml for periodic browser-based testing (weekly + on visualization changes)
  • Coverage reporting for visualization components

Developer Tools

  • Added run_viz_tests.sh helper script for local testing
  • Created VISUALIZATION_TESTING.md documentation
  • Updated pyproject.toml with testing dependencies

How It Works

The testing approach follows Solara's recommended practices for testing without a browser:

def test_example():
    # 1. Set up test model
    model = Schelling(seed=42)
    
    # 2. Create and render visualization component
    component = make_altair_space(agent_portrayal)
    box, rc = solara.render(component(model), handle_error=False)
    
    # 3. Find and verify UI elements
    assert rc.find("div").widget is not None
    
    # 4. Test model interaction
    step_button = rc.find(v.Btn, children=["Step"]).widget
    step_button.click()
    assert model.schedule.steps > 0
    
    # 5. Re-render and verify updates
    box, rc = solara.render(component(model), handle_error=False)
    assert rc.find(steps_text).widget is not None

This approach allows us to thoroughly test visualization components without requiring a browser in CI, while still providing browser-based tests for more comprehensive validation when needed.

Running Tests Locally

# Run browser-less tests
./tests/run_viz_tests.sh

# Run with browser-based UI tests
./tests/run_viz_tests.sh --ui

# Run performance benchmarks
./tests/run_viz_tests.sh --benchmarks  

# Update reference screenshots
./tests/run_viz_tests.sh --ui --update-snapshots

These tests address the current gap where basic model execution is tested but visualization functionality is not systematically verified.

Summary by CodeRabbit

  • New Features

    • Automated workflows now ensure that visualization components render consistently, with improved error handling to deliver a more reliable visual experience.
  • Tests

    • Expanded testing suites, including both browser-less and browser-based tests, help verify rendering accuracy, responsiveness, and performance.
  • Documentation

    • New guides provide detailed insights into visualization testing strategies and setup, supporting ongoing improvements in visual integrity.

Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +1.5% [+0.7%, +2.3%] 🔵 +0.5% [+0.3%, +0.7%]
BoltzmannWealth large 🔵 +21.8% [+2.4%, +58.6%] 🔴 +9.4% [+5.9%, +12.6%]
Schelling small 🔵 +0.4% [+0.0%, +0.7%] 🔵 +0.1% [-0.6%, +0.7%]
Schelling large 🔵 +0.2% [-0.4%, +0.9%] 🔵 -1.2% [-4.9%, +3.0%]
WolfSheep small 🔵 -0.1% [-0.4%, +0.3%] 🔵 +0.3% [+0.0%, +0.5%]
WolfSheep large 🔵 +2.9% [+0.7%, +5.1%] 🔵 +5.7% [+2.6%, +9.6%]
BoidFlockers small 🔵 -0.3% [-1.1%, +0.5%] 🔵 +0.0% [-0.3%, +0.4%]
BoidFlockers large 🔵 -1.1% [-1.7%, -0.3%] 🔵 -0.3% [-0.6%, +0.2%]

@Ya-shh
Copy link
Author

Ya-shh commented Mar 27, 2025

Fixes : #2734

@Ya-shh
Copy link
Author

Ya-shh commented Mar 27, 2025

@EwoutH could you please review it ?

@EwoutH EwoutH requested review from Sahil-Chhoker and EwoutH March 27, 2025 16:54
@EwoutH EwoutH added testing Release notes label ci Release notes label visualisation labels Mar 27, 2025
@EwoutH
Copy link
Member

EwoutH commented Mar 27, 2025

Thanks a lot for this PR!

@Sahil-Chhoker could you take a first look?

I will try to dive into it somewhere this weekend.

@EwoutH
Copy link
Member

EwoutH commented Mar 27, 2025

@Ya-shh over a thousand lines of codes added spread along 16 files will add a significant maintenance burden. Could you A) try to reduce it to the bare essential and then B) motivate why each remaining file / part of code is necessary?

@Ya-shh
Copy link
Author

Ya-shh commented Mar 27, 2025

@EwoutH Hey, thanks for pointing this out. Actually, the files that increased the lines of code are docs.md files, which I thought might be useful for new contributors or even the community. However, if they seem like too much of a burden on the codebase, I will remove them.

@Sahil-Chhoker
Copy link
Collaborator

@Sahil-Chhoker could you take a first look?

Sorry for my inactivity recently, I am trying to focus on my proposal. But I think I can take a look at this tomorrow.

@EwoutH
Copy link
Member

EwoutH commented Mar 27, 2025

@Sahil-Chhoker no worries, only if you have the time. Your own proposal is important!

@Ya-shh fair, a lot of it is docs indeed. I wil try to take a look later.

@EwoutH
Copy link
Member

EwoutH commented Mar 30, 2025

@Ya-shh did you notice that all tests in CI are being skipped?

image

mesa/__init__.py Outdated
@@ -26,5 +26,5 @@
__title__ = "mesa"
__version__ = "3.2.0.dev"
__license__ = "Apache 2.0"
_this_year = datetime.datetime.now(tz=datetime.UTC).date().year
_this_year = datetime.datetime.now(tz=datetime.timezone.utc).date().year
Copy link
Member

@EwoutH EwoutH Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mesa currently supports Python 3.11 and above. That's enforced by pre-commit, which is why it's changed back every time.

See

requires-python = ">=3.11"

if post_process is not None:
chart = post_process(chart)
except Exception as e:
# If chart creation fails but we have a post_process, call it with a minimal chart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its interesting idea, but not sure this is really what users want. Maybe yes, maybe no, but I would like to leave this change out of this PR

@@ -178,7 +187,7 @@ def _draw_grid(space, agent_portrayal, propertylayer_portrayal):
# no y-axis label
"y": alt.Y("y", axis=None, type=x_y_type),
"tooltip": [
alt.Tooltip(key, type=alt.utils.infer_vegalite_type_for_pandas([value]))
alt.Tooltip(key, type="nominal")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this change intentional? Doesn't seem related to testing

@@ -0,0 +1,78 @@
# Add Comprehensive Testing for Mesa's Visualization Components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this file and summary.md is not supposed to be part of this PR

step_button = page_session.locator("button:has-text('Step')")
step_button.wait_for()

page_session.locator("svg").wait_for()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"svg" selector seems a bit brittle. If possible this should target something more specific and less implementation specific (here we render a svg, but this might change to a canvas for example). Maybe its possible to target the whole output components container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the svg selector being brittle. If the visualization implementation changes from SVG to canvas or something else, this test would break.I have made the changes , you can now review them

component = make_mpl_space_component(self.agent_portrayal)
box, rc = solara.render(component(mock_model), handle_error=False)

assert rc.find("div").widget is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "not None" assertion is a good first step, but preferably it should test something a bit more specific (but can be done in later PRs)

@Corvince
Copy link
Contributor

Corvince commented Apr 1, 2025

Great initiative @Ya-shh ! Lots of things I like about this PR. I especially like the CI integration. Left a few smaller comments, but the biggest question is why all component tests are currently being skipped?

This comment was marked as off-topic.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (18)
mesa/visualization/components/altair_components.py (1)

82-94: Improved error handling for chart creation is a good addition.

Adding a try-except block around chart creation provides a more graceful approach to handling exceptions when post-processing is involved. This ensures the post-process function gets called with a minimal chart even when the main chart creation fails.

Consider capturing the result of post_process(minimal_chart) even though you're going to re-raise the exception. For consistency with line 86, you might want to use:

-            post_process(minimal_chart)
+            minimal_chart = post_process(minimal_chart)
tests/ui/README.md (1)

1-85: Well-structured and comprehensive documentation!

This README provides excellent guidance for setting up and running browser-based UI tests. The clear sections, code examples, and troubleshooting tips make it very accessible.

Consider adding:

  1. A brief explanation of what's included in the viz-test extra
  2. More context about when/why to update reference screenshots
  3. Instructions on how to manually trigger the CI workflow
.github/workflows/ui-viz-tests.yml (4)

11-11: Update checkout action version

The current version of the checkout action is outdated. GitHub recommends using the latest version for security and performance improvements.

-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
🧰 Tools
🪛 actionlint (1.7.4)

11-11: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


14-16: Update setup-python action version

The setup-python action is using an outdated version. Update to the latest version for better compatibility and security.

-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: '3.11'
🧰 Tools
🪛 actionlint (1.7.4)

14-14: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


29-36: Update upload-artifact action and add missing newline

The upload-artifact action is outdated, and the file is missing a newline at the end.

     - name: Upload test screenshots (on failure)
       if: failure()
-      uses: actions/upload-artifact@v3
+      uses: actions/upload-artifact@v4
       with:
         name: test-screenshots
         path: |
           tests/ui/screenshots
           tests/ui/diff-*.png
-        if-no-files-found: ignore
+        if-no-files-found: ignore
+
🧰 Tools
🪛 actionlint (1.7.4)

30-30: the runner of "actions/upload-artifact@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.35.1)

[error] 36-36: no new line character at the end of file

(new-line-at-end-of-file)


6-9: Consider adding timeout and dependency caching

To prevent hung workflows and improve performance, consider adding a timeout limit and dependency caching.

 jobs:
   browser-tests:
     runs-on: ubuntu-latest
+    timeout-minutes: 10

And for the Python setup step:

     - name: Set up Python 3.11
       uses: actions/setup-python@v5
       with:
         python-version: '3.11'
+        cache: 'pip'
.github/workflows/viz-tests.yml (4)

17-17: Upgrade Actions/Checkout Version
The workflow currently uses actions/checkout@v3. Static analysis suggests that this version may be outdated. Please verify whether a newer release (if available) would provide added security or features and consider updating accordingly.

🧰 Tools
🪛 actionlint (1.7.4)

17-17: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


19-23: Upgrade Setup-Python Version
The workflow uses actions/setup-python@v4. Similar to the checkout action, please verify if a more recent version exists and update if applicable to ensure the job runs with the latest improvements.

🧰 Tools
🪛 actionlint (1.7.4)

20-20: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


43-43: Upgrade Codecov Action Version
The step using codecov/codecov-action@v3 might also be based on an older version. It is advisable to check for a newer release of this action and upgrade if available to benefit from the latest fixes and features.

🧰 Tools
🪛 actionlint (1.7.4)

43-43: the runner of "codecov/codecov-action@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


48-48: Add Newline at End of File
YAMLlint reports that there is no newline character at the end of the file. Please add a newline at the end to adhere to best practices and avoid potential issues in some tooling.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

tests/run_viz_tests.sh (2)

11-34: Improve Argument Parsing
The script’s argument parsing uses a for loop with embedded shift calls, which can be slightly confusing because $@ is expanded only once. Consider refactoring this section to use a while loop (e.g., while [ "$#" -gt 0 ]; do ...; shift; done) for clearer and more robust flag handling. Additionally, the documentation later references an --update-snapshots flag that isn’t currently handled in the script. Please either add support for this flag or update the documentation for consistency.


86-89: Exit Logic Verification
The exit logic correctly combines the results of the browser-less and browser-based tests. As an additional improvement, consider logging the failure codes explicitly to aid debugging in CI if tests fail.

docs/VISUALIZATION_TESTING.md (1)

1-4: Content Clarity
The introduction and testing approach sections are well organized. A brief proofreading pass to address any minor punctuation inconsistencies would further improve clarity.

tests/VISUALIZATION_TESTING.md (1)

78-78: Punctuation Suggestion
In the sentence detailing test coverage (line 78), consider adding a comma before “but” to improve readability as suggested by LanguageTool.

tests/ui/test_browser_viz.py (2)

53-73: Consider adding more robust assertions for chart verification.

While checking for the presence of SVG paths is a good start, consider adding more specific assertions to verify that the charts render with expected data. For example, checking the number of data points or specific path attributes.

-    chart_paths = page_session.locator("svg path")
-    assert chart_paths.count() > 0
+    chart_paths = page_session.locator("svg path[d]")
+    # Ensure we have a reasonable number of paths for our chart
+    path_count = chart_paths.count()
+    assert path_count > 5, f"Expected multiple chart paths, got {path_count}"
+    
+    # Verify chart title or labels if possible
+    assert page_session.locator("text.chart-title").is_visible()

99-125: Well-designed reference screenshot generation function.

The skipped test for generating reference screenshots is well-documented and properly organized. The approach of generating screenshots for multiple models provides a solid foundation for visual regression testing.

Consider parametrizing this test to reduce code duplication:

@pytest.mark.skip(reason="Only run manually to generate reference screenshots")
@pytest.mark.parametrize(
    "model_name,page_function,screenshot_filename",
    [
        ("schelling", "mesa.examples.basic.schelling.app.page", "schelling_reference.png"),
        ("conway", "mesa.examples.basic.conways_game_of_life.app.page", "conway_reference.png"),
        ("boltzmann", "mesa.examples.basic.boltzmann_wealth_model.app.page", "boltzmann_reference.png"),
        ("virus", "mesa.examples.basic.virus_on_network.app.page", "virus_reference.png"),
    ],
)
def test_generate_reference_screenshots(solara_test, page_session: Page, model_name, page_function, screenshot_filename):
    """Generate reference screenshots for all example models."""
    import importlib
    
    module_name, attr_name = page_function.rsplit(".", 1)
    module = importlib.import_module(module_name)
    page = getattr(module, attr_name)
    
    page()
    page_session.wait_for_timeout(1000)
    page_session.screenshot(path=screenshot_filename)
tests/test_visualization_components.py (2)

272-298: Consider parameterizing performance benchmarks with different model sizes.

The performance benchmark test is a good addition, but it only tests with one model size (50x50). Consider parameterizing it to test with different model sizes to better understand performance characteristics.

@pytest.mark.skip(reason="Benchmark tests are optional")
@pytest.mark.parametrize("grid_size", [(10, 10), (50, 50), (100, 100)])
def test_performance_benchmarks(self, grid_size):
    """Test the rendering performance of visualization components with different model sizes."""
    import time
    
    width, height = grid_size
    
    def agent_portrayal(agent):
        return {"color": "red", "marker": "o", "size": 5}
    
    # Create a model with the specified grid size
    model = Schelling(width=width, height=height, seed=42)
    
    # Measure rendering time
    start_time = time.time()
    component = make_altair_space(agent_portrayal)
    box, rc = solara.render(component(model), handle_error=False)
    render_time = time.time() - start_time
    
    # Scale threshold based on grid size
    max_time = 0.5 if width <= 10 else (5.0 if width <= 50 else 15.0)
    assert render_time < max_time, f"Rendering {width}x{height} grid took too long: {render_time:.2f}s (max: {max_time}s)"

301-309: Consider adding inline documentation to explain the relationship between class-based and module-level tests.

The presence of both a class-based benchmark and a module-level benchmark function is confusing without more explanation. The comment on line 308 indicates they are related, but more detail would be helpful.

 @pytest.mark.skip(reason="Benchmark tests are optional")
 def test_performance_benchmarks():
-    """Module-level test function for visualization performance benchmarks.
+    """Module-level test function for visualization performance benchmarks.
 
     This is a module-level version of the TestPerformanceBenchmarks.test_performance_benchmarks
     test to ensure compatibility with CI test runners.
+
+    Some CI systems and test discovery mechanisms work better with module-level
+    test functions than with class-based tests. This function ensures that
+    benchmark tests can be run in both contexts. The actual implementation
+    is in the class-based version to maintain consistency with other test cases.
     """
     # The actual benchmark is in TestPerformanceBenchmarks.test_performance_benchmarks
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5166670 and 8451b72.

📒 Files selected for processing (15)
  • .github/workflows/build_lint.yml (1 hunks)
  • .github/workflows/ui-viz-tests.yml (1 hunks)
  • .github/workflows/viz-tests.yml (1 hunks)
  • PR_DESCRIPTION.md (1 hunks)
  • docs/VISUALIZATION_TESTING.md (1 hunks)
  • mesa/discrete_space/property_layer.py (2 hunks)
  • mesa/visualization/components/altair_components.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/VISUALIZATION_TESTING.md (1 hunks)
  • tests/run_viz_tests.sh (1 hunks)
  • tests/test_visualization_components.py (1 hunks)
  • tests/ui/README.md (1 hunks)
  • tests/ui/__init__.py (1 hunks)
  • tests/ui/conftest.py (1 hunks)
  • tests/ui/test_browser_viz.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
mesa/visualization/components/altair_components.py (3)
tests/test_visualization_components.py (6)
  • agent_portrayal (83-85)
  • agent_portrayal (258-259)
  • agent_portrayal (284-285)
  • propertylayer_portrayal (87-97)
  • post_process (129-130)
  • post_process (170-172)
mesa/examples/basic/boltzmann_wealth_model/app.py (2)
  • agent_portrayal (12-14)
  • post_process (36-37)
tests/test_components_matplotlib.py (1)
  • agent_portrayal (31-42)
tests/test_visualization_components.py (2)
mesa/discrete_space/property_layer.py (2)
  • PropertyLayer (30-192)
  • add_property_layer (232-258)
mesa/visualization/components/altair_components.py (2)
  • make_altair_space (26-60)
  • agent_portrayal (49-50)
🪛 actionlint (1.7.4)
.github/workflows/viz-tests.yml

17-17: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


20-20: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


25-25: shellcheck reported issue in this script: SC2102:info:3:27: Ranges can only match single chars (mentioned due to duplicates)

(shellcheck)


43-43: the runner of "codecov/codecov-action@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

.github/workflows/ui-viz-tests.yml

11-11: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


14-14: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


19-19: shellcheck reported issue in this script: SC2102:info:2:27: Ranges can only match single chars (mentioned due to duplicates)

(shellcheck)


30-30: the runner of "actions/upload-artifact@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.35.1)
.github/workflows/viz-tests.yml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

.github/workflows/ui-viz-tests.yml

[error] 36-36: no new line character at the end of file

(new-line-at-end-of-file)

🪛 LanguageTool
tests/VISUALIZATION_TESTING.md

[uncategorized] ~15-~15: Loose punctuation mark.
Context: ...est Organization - test_solara_viz.py: Tests for the SolaraViz component's cor...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~50-~50: Loose punctuation mark.
Context: ...nto the CI pipeline: 1. viz-tests.yml: Runs browser-less tests on every PR and...

(UNLIKELY_OPENING_PUNCTUATION)

PR_DESCRIPTION.md

[uncategorized] ~78-~78: Use a comma before ‘but’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ap where basic model execution is tested but visualization functionality is not syst...

(COMMA_COMPOUND_SENTENCE_2)

🔇 Additional comments (16)
mesa/visualization/components/altair_components.py (1)

190-190: Was this tooltip type change intentional?

Changing from dynamically inferred types to a static "nominal" type for tooltips doesn't seem directly related to testing improvements.

This change was previously questioned by Corvince as not being related to testing. If this is an intentional improvement, please clarify its purpose in the PR description.

tests/ui/__init__.py (1)

1-2: LGTM! Proper package initialization.

Adding a docstring to the __init__.py file clearly communicates the purpose of this package.

tests/ui/conftest.py (1)

8-11: Well-implemented fixture for snapshot testing.

The solara_snapshots_directory fixture provides a consistent location for storing and comparing UI test snapshots, which is essential for visual regression testing.

pyproject.toml (1)

67-72:

✅ Verification successful

Good addition of visualization testing dependencies.

Adding a separate optional dependency group for visualization testing is a clean approach. This keeps these specialized testing dependencies separate from regular development dependencies and makes it clear what's needed specifically for visualization testing.


🏁 Script executed:

#!/bin/bash
# Check that these packages can be installed together without conflicts
pip install pytest-playwright pytest-ipywidgets[solara] mesa[viz] --dry-run

Length of output: 16265


Visualization testing dependencies verified

The dry-run installation confirmed that mesa[viz], pytest-playwright, and pytest-ipywidgets[solara] can be installed together without conflicts. The code snippet in pyproject.toml (lines 67–72) is correct as is, and the approach to separate visualization testing dependencies is clean and effective.

.github/workflows/build_lint.yml (1)

57-57: Effective test separation approach

Adding --ignore=tests/ui/ to the pytest command is a good strategy to separate browser-based tests from the regular build workflow, making the CI process more efficient and reliable.

mesa/discrete_space/property_layer.py (2)

109-109: Improved NumPy indexing approach

Replacing data[*[0 for _ in range(len(data.shape))]] with data[tuple(0 for _ in range(len(data.shape)))] is a good change as it uses the more idiomatic tuple-based indexing method for NumPy arrays.


324-324: Consistent NumPy indexing pattern

Changing from mask[*indices] to mask[tuple(indices)] maintains consistency with the earlier indexing change and follows NumPy's recommended practices for array indexing.

docs/VISUALIZATION_TESTING.md (1)

1-62: Excellent Detailed Instructions
This documentation file provides a very clear and comprehensive overview of Mesa's visualization testing strategy. The step‐by‐step commands for setup and test execution (including installation of dependencies and usage of the helper script) are especially helpful for new contributors.

tests/VISUALIZATION_TESTING.md (1)

1-99: Comprehensive Testing Strategy Documentation
This file thoroughly outlines the testing strategy, the organization of tests, and commands for running them locally and via CI. It’s a valuable resource that aligns nicely with the new CI workflows and helper scripts.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~15-~15: Loose punctuation mark.
Context: ...est Organization - test_solara_viz.py: Tests for the SolaraViz component's cor...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~50-~50: Loose punctuation mark.
Context: ...nto the CI pipeline: 1. viz-tests.yml: Runs browser-less tests on every PR and...

(UNLIKELY_OPENING_PUNCTUATION)

PR_DESCRIPTION.md (1)

1-1: Clarification on File Inclusion
There is a past review comment suggesting that PR_DESCRIPTION.md (and similar summary files) might not belong in the PR. If these documentation files are intentionally included to provide context and clarity, please disregard past comments.

tests/ui/test_browser_viz.py (4)

1-6: Well-structured docstring and purpose explanation.

The docstring clearly explains the purpose of these browser-based tests and their relationship to the browser-less tests. This helps maintain good documentation standards.


12-32: Use more specific selectors instead of generic "svg" selector.

On line 25, using a generic "svg" selector is brittle and implementation-specific. If the visualization technology changes from SVG to canvas in the future, these tests will break.

Consider targeting a more specific container or element with a dedicated ID or class name instead. For example:

-    page_session.locator("svg").wait_for()
+    page_session.locator("[data-testid='visualization-container'] svg").wait_for()

34-51: Good approach for comparing visualization state changes.

The technique of capturing screenshots before and after clicking the step button to verify state changes is excellent. This ensures that the visualization is actually updating in response to user actions.

However, the same concern about the generic "svg" selector applies to lines 43, 45, and 48.


75-97: Good use of JavaScript evaluation for slider interaction.

The approach to manipulate the slider using JavaScript evaluation is robust and ensures proper event triggering. This is a good practice for testing slider components.

tests/test_visualization_components.py (2)

100-139: All matplotlib component tests are skipped - consider enabling at least some tests.

While it's understandable that some tests need to be skipped for compatibility reasons, having all tests skipped means there's no actual test coverage for these components.

Consider enabling at least some basic tests, or provide more specific information about what compatibility issues need to be addressed to make these tests runnable.

Additionally, the assertions are very basic (just checking if widgets are not None). As noted in previous reviews, more specific assertions would be valuable.

#!/bin/bash
# Check if there are any active (non-skipped) visualization tests in the codebase

echo "Checking for active visualization tests..."
active_tests=$(rg -n "def test_" --no-filename tests/test_visualization_components.py | grep -v "@pytest.mark.skip" | wc -l)
echo "Number of active (non-skipped) visualization tests: $active_tests"

echo "Checking skip reasons..."
rg -A 1 "@pytest.mark.skip" --no-filename tests/test_visualization_components.py

181-247:

❓ Verification inconclusive

Clarify timeline for updating skipped tests to match new API.

All example model visualization tests are skipped due to API changes. The PR summary suggests these tests are important for ensuring visualization components work across all example models.

Please clarify:

  1. What specific API changes have occurred?
  2. When will these tests be updated to match the new API?
  3. Is there an issue tracker item to follow up on this?

It's important to have a plan to enable these tests to ensure proper coverage of visualization functionality.


🏁 Script executed:

#!/bin/bash
# Investigate API changes mentioned in skip reasons

echo "Looking for API change mentions in the codebase..."
rg "API has changed" --include "*.py"

echo "Checking for related issues in GitHub..."
gh issue list --search "API changes visualization" --limit 5

Length of output: 874


Clarify and Track Updates for Visualization API Changes

The tests in tests/test_visualization_components.py are marked as skipped due to changes in the model API, but it's unclear which API changes have impacted these tests and how they affect the visualization components. Please clarify the following:

  • Specific API Changes: What are the specific modifications in the model API that necessitated skipping these tests?
  • Timeline for Updates: When are you planning to update these tests to align with the new API?
  • Issue Tracking: Is there an existing issue or ticket to track these changes? If not, could one be created to ensure that the visualization components are properly validated once updated?

Comment on lines 1 to 78
# Add Comprehensive Testing for Mesa's Visualization Components

## Overview
This PR adds a robust testing infrastructure for Mesa's visualization components (SolaraViz) to ensure they render correctly and function properly across all example models.

## Key Additions

### Browser-less Unit Tests (`test_visualization_components.py`)
- Tests for both Matplotlib and Altair visualization backends
- Component-level tests for space and plot visualizations
- Cross-model tests to verify visualization functionality in all example models
- Interactive testing of model controllers (step, play/pause, reset)
- Performance benchmarks to measure rendering efficiency

### Browser-based UI Tests (`tests/ui/test_browser_viz.py`) - Manual Execution Only
- Integration tests for end-to-end visualization rendering
- Visual regression testing via screenshot comparisons
- Tests for parameter controls and interactive features
- Compatibility with Solara's recommended testing approach

**Note**: Browser-based tests require additional dependencies (`playwright`) and are configured to run only manually or on a weekly schedule, not on every PR.

### CI Integration
- Added `viz-tests.yml` for fast browser-less tests on all PRs
- Added `ui-viz-tests.yml` for periodic browser-based testing (weekly + manual triggers)
- Coverage reporting for visualization components

### Developer Tools
- Added `run_viz_tests.sh` helper script for local testing
- Created `VISUALIZATION_TESTING.md` documentation
- Updated `pyproject.toml` with testing dependencies

## How It Works

The testing approach follows Solara's recommended practices for testing without a browser:

```python
def test_example():
# 1. Set up test model
model = Schelling(seed=42)

# 2. Create and render visualization component
component = make_altair_space(agent_portrayal)
box, rc = solara.render(component(model), handle_error=False)

# 3. Find and verify UI elements
assert rc.find("div").widget is not None

# 4. Test model interaction
step_button = rc.find(v.Btn, children=["Step"]).widget
step_button.click()
assert model.schedule.steps > 0
```

This approach allows us to thoroughly test visualization components without requiring a browser in CI, while still providing browser-based tests for more comprehensive validation when needed.

## Running Tests Locally

```bash
# Run browser-less tests
./tests/run_viz_tests.sh

# Run with browser-based UI tests (requires additional dependencies)
pip install -e .[viz-test]
playwright install chromium
./tests/run_viz_tests.sh --ui

# Run performance benchmarks
./tests/run_viz_tests.sh --benchmarks
```

## Benefits
- Ensures visualization components render correctly across all examples
- Provides performance metrics for visualization rendering
- Prevents regressions in visualization functionality
- Follows Solara's recommended testing practices

These tests address the current gap where basic model execution is tested but visualization functionality is not systematically verified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

PR Description Clarity and Consistency
The PR description is detailed and clearly outlines the new tests, CI integrations, and documentation updates. One point to note: the description mentions an --update-snapshots flag for the helper script, but the current implementation of run_viz_tests.sh does not handle this flag. Please update either the script or the documentation to maintain consistency.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~78-~78: Use a comma before ‘but’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ap where basic model execution is tested but visualization functionality is not syst...

(COMMA_COMPOUND_SENTENCE_2)

Comment on lines 141 to 179
class TestAltairComponents(TestBaseVisualizationComponents):
"""Tests for Altair visualization components."""

@pytest.mark.skip(
reason="Test needs updating for compatibility with local environment"
)
def test_altair_space_component(self, mock_model):
"""Test that Altair space component renders correctly."""
component = make_altair_space(self.agent_portrayal)
box, rc = solara.render(component(mock_model), handle_error=False)

assert rc.find("div").widget is not None

component_with_layers = make_altair_space(
self.agent_portrayal, propertylayer_portrayal=self.propertylayer_portrayal()
)
box, rc = solara.render(component_with_layers(mock_model), handle_error=False)
assert rc.find("div").widget is not None

@pytest.mark.skip(
reason="Test needs updating for compatibility with local environment"
)
def test_altair_plot_component(self, mock_model):
"""Test that Altair plot component renders correctly."""
component = make_plot_component({"data1": "red", "data2": "blue"})
box, rc = solara.render(component(mock_model), handle_error=False)

assert rc.find("div").widget is not None

def post_process(chart):
chart = chart.properties(title="Test Plot")
return chart

component_with_custom = make_plot_component(
{"data1": "red"}, post_process=post_process
)
box, rc = solara.render(component_with_custom(mock_model), handle_error=False)
assert rc.find("div").widget is not None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Consider adding test coverage for error handling in Altair components.

All Altair component tests are skipped, but the PR summary mentions "improving error handling in Altair components." These tests should validate the improved error handling.

Add tests that specifically verify error handling, for example:

  • Testing with invalid agent portrayal functions
  • Testing with incompatible property layer data
  • Testing with missing or malformed model data
@pytest.mark.skip(reason="Test needs updating for compatibility with local environment")
def test_altair_component_error_handling(self, mock_model):
    """Test that Altair component handles errors gracefully."""
    # Test with invalid portrayal function
    def invalid_portrayal(agent):
        return "not_a_dict"  # Should be a dict
    
    component = make_altair_space(invalid_portrayal)
    # Should not raise an exception, but handle the error gracefully
    box, rc = solara.render(component(mock_model), handle_error=True)
    # Verify error is displayed to user
    assert rc.find("div").text.contains("Error")

Enhance Error Handling Test Coverage for Altair Components

The current tests in tests/test_visualization_components.py only validate that a rendered div exists and are skipped, so they do not cover error handling in the Altair components as referenced in the PR summary. Please add tests that specifically validate error handling. For example, include tests that:

  • Confirm the component gracefully handles an invalid portrayal function (e.g., one that returns an incorrect type).
  • Verify behavior when provided with incompatible property layer data.
  • Ensure appropriate error feedback when model data is missing or malformed.

An example test could be:

@pytest.mark.skip(reason="Test needs updating for compatibility with local environment")
def test_altair_component_error_handling(self, mock_model):
    """Test that Altair component handles errors gracefully."""
    # Test with an invalid portrayal function
    def invalid_portrayal(agent):
        return "not_a_dict"  # Expected to return a dict

    component = make_altair_space(invalid_portrayal)
    # Render with handle_error=True so that errors are caught and displayed gracefully
    box, rc = solara.render(component(mock_model), handle_error=True)
    # Verify that an error message is displayed to the user
    assert "Error" in rc.find("div").text

Please add similar tests to cover all edge cases regarding error handling.

Comment on lines 33 to 98
class TestBaseVisualizationComponents:
"""Base test class for visualization components."""

@pytest.fixture
def mock_model(self):
"""Create a mock model with a grid."""

class MockAgent(Agent):
def __init__(self, model, agent_type=0):
super().__init__(model)
self.type = agent_type

class MockModel(Model):
def __init__(self, width=10, height=10, seed=42):
super().__init__(seed=seed)
self.width = width
self.height = height
self.grid = MultiGrid(width, height, torus=True)

# Create and add a property layer
test_layer = PropertyLayer(
name="test_layer", width=width, height=height, default_value=1.0
)
self.grid.add_property_layer(test_layer)

a1 = MockAgent(self, agent_type=0)
a2 = MockAgent(self, agent_type=1)
self.grid.place_agent(a1, (2, 2))
self.grid.place_agent(a2, (5, 5))

self.schedule = []
self.running = True
self.time_series_data = {"data1": [1, 2, 3], "data2": [3, 2, 1]}

# Add a mock datacollector
from unittest.mock import MagicMock

import pandas as pd

self.datacollector = MagicMock()
# Create a dataframe with time series data
self.datacollector.get_model_vars_dataframe.return_value = pd.DataFrame(
self.time_series_data
)

def step(self):
"""Step the model."""

return MockModel()

def agent_portrayal(self, agent):
"""Basic agent portrayal for testing."""
return {"color": "red" if agent.type == 0 else "blue", "marker": "o", "size": 5}

def propertylayer_portrayal(self):
"""Basic property layer portrayal for testing."""
return {
"test_layer": {
"colormap": "viridis",
"alpha": 0.75,
"colorbar": True,
"vmin": 0,
"vmax": 10,
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Well-structured base test class with comprehensive mock setup.

The TestBaseVisualizationComponents class provides an excellent foundation for visualization testing with a well-designed mock model and portrayal methods.

However, there's a potential issue with how the property layer is initialized:

-                test_layer = PropertyLayer(
-                    name="test_layer", width=width, height=height, default_value=1.0
-                )
+                test_layer = PropertyLayer(
+                    name="test_layer", dimensions=(width, height), default_value=1.0
+                )

Based on the PropertyLayer definition in mesa/discrete_space/property_layer.py, the constructor expects dimensions as a parameter, not width and height separately.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class TestBaseVisualizationComponents:
"""Base test class for visualization components."""
@pytest.fixture
def mock_model(self):
"""Create a mock model with a grid."""
class MockAgent(Agent):
def __init__(self, model, agent_type=0):
super().__init__(model)
self.type = agent_type
class MockModel(Model):
def __init__(self, width=10, height=10, seed=42):
super().__init__(seed=seed)
self.width = width
self.height = height
self.grid = MultiGrid(width, height, torus=True)
# Create and add a property layer
test_layer = PropertyLayer(
name="test_layer", width=width, height=height, default_value=1.0
)
self.grid.add_property_layer(test_layer)
a1 = MockAgent(self, agent_type=0)
a2 = MockAgent(self, agent_type=1)
self.grid.place_agent(a1, (2, 2))
self.grid.place_agent(a2, (5, 5))
self.schedule = []
self.running = True
self.time_series_data = {"data1": [1, 2, 3], "data2": [3, 2, 1]}
# Add a mock datacollector
from unittest.mock import MagicMock
import pandas as pd
self.datacollector = MagicMock()
# Create a dataframe with time series data
self.datacollector.get_model_vars_dataframe.return_value = pd.DataFrame(
self.time_series_data
)
def step(self):
"""Step the model."""
return MockModel()
def agent_portrayal(self, agent):
"""Basic agent portrayal for testing."""
return {"color": "red" if agent.type == 0 else "blue", "marker": "o", "size": 5}
def propertylayer_portrayal(self):
"""Basic property layer portrayal for testing."""
return {
"test_layer": {
"colormap": "viridis",
"alpha": 0.75,
"colorbar": True,
"vmin": 0,
"vmax": 10,
}
}
class TestBaseVisualizationComponents:
"""Base test class for visualization components."""
@pytest.fixture
def mock_model(self):
"""Create a mock model with a grid."""
class MockAgent(Agent):
def __init__(self, model, agent_type=0):
super().__init__(model)
self.type = agent_type
class MockModel(Model):
def __init__(self, width=10, height=10, seed=42):
super().__init__(seed=seed)
self.width = width
self.height = height
self.grid = MultiGrid(width, height, torus=True)
# Create and add a property layer
test_layer = PropertyLayer(
name="test_layer", dimensions=(width, height), default_value=1.0
)
self.grid.add_property_layer(test_layer)
a1 = MockAgent(self, agent_type=0)
a2 = MockAgent(self, agent_type=1)
self.grid.place_agent(a1, (2, 2))
self.grid.place_agent(a2, (5, 5))
self.schedule = []
self.running = True
self.time_series_data = {"data1": [1, 2, 3], "data2": [3, 2, 1]}
# Add a mock datacollector
from unittest.mock import MagicMock
import pandas as pd
self.datacollector = MagicMock()
# Create a dataframe with time series data
self.datacollector.get_model_vars_dataframe.return_value = pd.DataFrame(
self.time_series_data
)
def step(self):
"""Step the model."""
return MockModel()
def agent_portrayal(self, agent):
"""Basic agent portrayal for testing."""
return {"color": "red" if agent.type == 0 else "blue", "marker": "o", "size": 5}
def propertylayer_portrayal(self):
"""Basic property layer portrayal for testing."""
return {
"test_layer": {
"colormap": "viridis",
"alpha": 0.75,
"colorbar": True,
"vmin": 0,
"vmax": 10,
}
}

@Ya-shh Ya-shh closed this Apr 1, 2025
@Ya-shh Ya-shh reopened this Apr 1, 2025
@Ya-shh Ya-shh force-pushed the add-visualization-tests branch from 724159a to e8a4727 Compare April 1, 2025 19:05
@Ya-shh
Copy link
Author

Ya-shh commented Apr 2, 2025

Great initiative @Ya-shh ! Lots of things I like about this PR. I especially like the CI integration. Left a few smaller comments, but the biggest question is why all component tests are currently being skipped?

Thanks for your detailed review, @Corvince! I have fixed the minor issues you mentioned, and all tests are now passing without being skipped.

@Ya-shh Ya-shh requested review from Corvince and EwoutH April 2, 2025 18:21
@Ya-shh
Copy link
Author

Ya-shh commented Apr 3, 2025

@EwoutH & @Corvince

If any more changes need to be applied to specific files, please let me know

@Ya-shh
Copy link
Author

Ya-shh commented Apr 4, 2025

hey any updates ?

@EwoutH & @Corvince

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Release notes label testing Release notes label visualisation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants