Skip to content

tests: start adding e2e tests #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

tests: start adding e2e tests #55

wants to merge 15 commits into from

Conversation

sd2k
Copy link
Collaborator

@sd2k sd2k commented Mar 21, 2025

This PR adds end-to-end tests for Loki integration and adds test documentation.
This is iteration one as we want to add a basic structure on e2e testing for now. We need to iterate further on them.

Note: prompts needs to be specific when using llm-as-a-judge. I've noticed some flakiness on the llm responses so some times tests are failing, especially the test_loki_logs_tool.

When we are confident that tests are consistently passing then we can make it part of the ruleset.

@ioanarm ioanarm marked this pull request as ready for review April 3, 2025 09:42
@ioanarm ioanarm requested a review from a team as a code owner April 3, 2025 09:42
Copy link
Collaborator Author

@sd2k sd2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for getting this working properly! I can't approve since I'm the OG author so we'll need someone else to take a look too.

@ioanarm ioanarm self-assigned this Apr 4, 2025
@pytest.mark.parametrize("model", models)
async def test_loki_logs_tool(model: str, mcp_client: ClientSession):
tools = await mcp_client.list_tools()
prompt = "Can you list the last 10 log lines from all containers using any available Loki datasource? Give me the raw log lines. Please use only the necessary tools to get this information."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing for me at least half the time. Generally from trying to put in some non-container label matcher, anything from {job=~".+"} to {job="varlog"}. I wonder if we could at least tweak the prompt or the tool description to get the test to work more consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants