Skip to content

CI is not testing evals without logfire #1401

Open
@DouweM

Description

@DouweM

Initial Checks

Description

Pydantic is tested like this:

- run: uv run --package pydantic-ai-slim coverage run -m pytest
env:
COVERAGE_FILE: coverage/.coverage.${{ runner.os }}-py${{ matrix.python-version }}-slim
- run: uv run coverage run -m pytest
env:
COVERAGE_FILE: coverage/.coverage.${{ runner.os }}-py${{ matrix.python-version }}-standard
- run: uv run --all-extras coverage run -m pytest
env:
COVERAGE_FILE: coverage/.coverage.${{ runner.os }}-py${{ matrix.python-version }}-all-extras

Presumably, that's:

  1. Slim
  2. With Evals (and others)
  3. With Evals and Logfire (and others)

As I noticed in #1400 (comment), though, some of the Evals tests are being skipped in mode 2, so we're not actually testing Evals without Logfire:

https://github.com/pydantic/pydantic-ai/actions/runs/14304998822/job/40086857591

tests/evals/test_dataset.py ssssssssssssssssssssssssssssss                                                                                     [  3%]
tests/evals/test_evaluator_base.py ........                                                                                                    [  3%]
tests/evals/test_evaluator_common.py ssssssssssssss                                                                                            [  5%]
tests/evals/test_evaluator_context.py ...                                                                                                      [  5%]
tests/evals/test_evaluator_spec.py .....                                                                                                       [  6%]
tests/evals/test_evaluators.py sssssssssssssss                                                                                                 [  7%]
tests/evals/test_llm_as_a_judge.py ....                                                                                                        [  8%]
tests/evals/test_otel.py sssssssssssssssssssssssss                                                                                             [ 10%]
tests/evals/test_render_numbers.py sssssssssssssssssssssssssssssssssssssssssssssssssssssssssss                                                 [ 16%]
tests/evals/test_reporting.py ssssss                                                                                                           [ 17%]
tests/evals/test_reports.py ssssss                                                                                                             [ 17%]
tests/evals/test_utils.py ..........                                                                                                           [ 18%]

This is why #1375 and #1399 weren't caught in CI.

The underlying issue is that the test checks if pydantic_evals can be imported, and if not, it assumes it's in mode 1 and doesn't need to run those tests at all. But the reason why pydantic_evals couldn't be installed in this case is not because it was skipped intentionally, but rather because of #1375: opentelemetry-sdk was missing as a dependency, and only available if logfire happened to be installed.

Which is why that evals-without-logfire test didn't start running until I fixed that issue in #1400.

And the reason the test is failing on that PR is because it has a hard dependency on logfire, even though it should be optional.

I can look at fixing the test to make logfire optional in that same PR, but I thought this higher level issue was worth tracking separately.


Note also that in tests/evals/test_dataset.py, "is logfire installed" is implicitly being checked as part of the "is pydantic-evals installed" check, but in tests/evals/test_evaluators.py and tests/evals/test_evaluator_common.py, import logfire is actually explicitly stated inside the with try_import() as imports_successful context manager. I assume this was written before it was decided to make logfire optional for evals.

Python, Pydantic AI & LLM client version

Python 3.13, pydantic-ai 0.0.53

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions