-
Notifications
You must be signed in to change notification settings - Fork 41
tests: start adding e2e tests #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for getting this working properly! I can't approve since I'm the OG author so we'll need someone else to take a look too.
@pytest.mark.parametrize("model", models) | ||
async def test_loki_logs_tool(model: str, mcp_client: ClientSession): | ||
tools = await mcp_client.list_tools() | ||
prompt = "Can you list the last 10 log lines from all containers using any available Loki datasource? Give me the raw log lines. Please use only the necessary tools to get this information." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing for me at least half the time. Generally from trying to put in some non-container label matcher, anything from {job=~".+"}
to {job="varlog"}
. I wonder if we could at least tweak the prompt or the tool description to get the test to work more consistently.
This PR adds end-to-end tests for Loki integration and adds test documentation.
This is iteration one as we want to add a basic structure on e2e testing for now. We need to iterate further on them.
Note: prompts needs to be specific when using llm-as-a-judge. I've noticed some flakiness on the llm responses so some times tests are failing, especially the
test_loki_logs_tool
.When we are confident that tests are consistently passing then we can make it part of the ruleset.