Description
I'm using pydantic-ai
with an MCPServerHTTP
for a smart home assistant. I want the agent to perform a two-step process for control tasks:
- First, call a tool to query/list available smart home devices.
- Then, use the information from step 1 (specifically the
entity_id
) to call another tool to control the target device.
I've set retries=3
on the Agent
, expecting that if any part of this process (especially the tool calls via MCP server) fails, it would be retried.
However, when I give a command like "Turn off the monitor light bar," the agent seems to skip the device listing/identification step and immediately asks me for the entity_ID
. This suggests it's not attempting the desired two-step process or retrying the initial discovery phase.
If the user's query was just about device status (e.g., "Is the light on?"), I'd expect it to perform the query step but not necessarily proceed to a control step. The current issue is about the control scenario where discovery should precede action.
Code (Python):
import asyncio
import os
import sys
from pydantic_ai.mcp import MCPServerHTTP
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from typing import List # Removed Union as it wasn't used
# Dummy config values for reproducibility
class DummyConfig:
def get(self, key):
if key == 'ha_auth_token':
return "DUMMY_HA_AUTH_TOKEN"
if key == 'ha_server_url':
return "http://localhost:8123/api/" # Replace with your HA server URL
if key == 'sf_api_key':
return "DUMMY_SF_API_KEY"
if key == 'sf_base_url':
return "DUMMY_SF_BASE_URL" # Replace with your LLM provider base URL
return None
config = DummyConfig()
headers = {
"Authorization": f"Bearer {config.get('ha_auth_token')}",
}
server = MCPServerHTTP(url=config.get('ha_server_url'), headers=headers)
system_prompt_en = """You are a smart home assistant designed to help users control smart home devices.
When a user asks you to turn a device on or off:
1. First, call the tool to query the device list and entities to determine which device the user intends to control. You can judge based on name similarity.
2. When calling the interface to switch the device and passing arguments:
Prioritize using domain, device_class, and entity_id to locate the device, rather than name.
Correct example, using entity_id:
<arguments>{
"domain": ["switch"],
"device_class": ["switch"],
"entity_id": "switch.monitor_light_bar_switch"
}</arguments>
3. When the device switching interface returns "isError": false, it means the switch was successful.
"""
agent: Agent = Agent(
model=OpenAIModel(
model_name="THUDM/GLM-4-9B-0414", # Example model
provider=OpenAIProvider(api_key=config.get('sf_api_key'), base_url=config.get('sf_base_url'))
),
mcp_servers=[server],
system_prompt=system_prompt_en,
retries=3,
output_retries=3
)
async def main():
async with agent.run_mcp_servers():
result = await agent.run('Turn off the monitor light bar.')
print(result)
print(f"Agent Output: {result.output}")
if __name__ == '__main__':
asyncio.run(main())
Agent Output (Translated):
AgentRunResult(output='\nSorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.')
Agent Output:
Sorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.
Expected Behavior:
For a command like "Turn off the monitor light bar":
- Agent attempts to call a tool (via
MCPServerHTTP
) to list/query devices to find the "monitor light bar". - If this initial tool call fails or doesn't yield enough info, it should be retried (due to
retries=3
). - Once the device is identified (e.g.,
entity_id: "switch.monitor_light_bar_switch"
is found), the agent attempts a second tool call to control this specificentity_id
. - This second tool call should also be retried upon failure.
- The agent should only ask the user for an
entity_id
if the entire multi-step process (including retries for each tool call) genuinely fails to identify or control the device.
Here is an image for Agent to use tools twice in Cherry Studio.
Another example to ask the tempature,only use tool once.
Question:
How can I configure or guide the Agent
to reliably attempt this multi-step tool usage (e.g., first list/query devices, then control a specific device by its ID)? Specifically, how do I ensure the retries
parameter applies to each tool call within such a sequence made via the MCPServerHTTP
before the agent gives up and asks the user for direct input like an entity_id
?
Thanks!