How to ensure Agent retries MCP server tool calls with retries

I'm using `pydantic-ai` with an `MCPServerHTTP` for a smart home assistant. I want the agent to perform a two-step process for control tasks:
1.  First, call a tool to query/list available smart home devices.
2.  Then, use the information from step 1 (specifically the `entity_id`) to call another tool to control the target device.

I've set `retries=3` on the `Agent`, expecting that if any part of this process (especially the tool calls via MCP server) fails, it would be retried.

However, when I give a command like "Turn off the monitor light bar," the agent seems to skip the device listing/identification step and immediately asks me for the `entity_ID`. This suggests it's not attempting the desired two-step process or retrying the initial discovery phase.

If the user's query was just about device status (e.g., "Is the light on?"), I'd expect it to perform the query step but not necessarily proceed to a control step. The current issue is about the control scenario where discovery should precede action.

**Code (Python):**
```python
import asyncio
import os
import sys
from pydantic_ai.mcp import MCPServerHTTP
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from typing import List # Removed Union as it wasn't used

# Dummy config values for reproducibility
class DummyConfig:
    def get(self, key):
        if key == 'ha_auth_token':
            return "DUMMY_HA_AUTH_TOKEN"
        if key == 'ha_server_url':
            return "http://localhost:8123/api/" # Replace with your HA server URL
        if key == 'sf_api_key':
            return "DUMMY_SF_API_KEY"
        if key == 'sf_base_url':
            return "DUMMY_SF_BASE_URL" # Replace with your LLM provider base URL
        return None

config = DummyConfig()

headers = {
    "Authorization": f"Bearer {config.get('ha_auth_token')}",
}
server = MCPServerHTTP(url=config.get('ha_server_url'), headers=headers)

system_prompt_en = """You are a smart home assistant designed to help users control smart home devices.
When a user asks you to turn a device on or off:
1. First, call the tool to query the device list and entities to determine which device the user intends to control. You can judge based on name similarity.
2. When calling the interface to switch the device and passing arguments:
   Prioritize using domain, device_class, and entity_id to locate the device, rather than name.
   Correct example, using entity_id:
     <arguments>{
       "domain": ["switch"],
       "device_class": ["switch"],
       "entity_id": "switch.monitor_light_bar_switch"
     }</arguments>
3. When the device switching interface returns "isError": false, it means the switch was successful.
"""

agent: Agent = Agent(
    model=OpenAIModel(
        model_name="THUDM/GLM-4-9B-0414", # Example model
        provider=OpenAIProvider(api_key=config.get('sf_api_key'), base_url=config.get('sf_base_url'))
    ),
    mcp_servers=[server],
    system_prompt=system_prompt_en,
    retries=3,
    output_retries=3
)

async def main():
    async with agent.run_mcp_servers():
        result = await agent.run('Turn off the monitor light bar.')
        print(result)
        print(f"Agent Output: {result.output}")

if __name__ == '__main__':
    asyncio.run(main())
```

**Agent Output (Translated):**
```
AgentRunResult(output='\nSorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.')

Agent Output: 
Sorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.
```

**Expected Behavior:**
For a command like "Turn off the monitor light bar":
1.  Agent attempts to call a tool (via `MCPServerHTTP`) to list/query devices to find the "monitor light bar".
2.  If this initial tool call fails or doesn't yield enough info, it should be retried (due to `retries=3`).
3.  Once the device is identified (e.g., `entity_id: "switch.monitor_light_bar_switch"` is found), the agent attempts a second tool call to control this specific `entity_id`.
4.  This second tool call should also be retried upon failure.
5.  The agent should only ask the user for an `entity_id` if the entire multi-step process (including retries for each tool call) genuinely fails to identify or control the device.

---

Here is an image for Agent to use tools twice in Cherry Studio.

![Image](https://github.com/user-attachments/assets/b1ec4f2b-3a3b-44ce-9316-b8d908018609)

Another example to ask the tempature,only use tool once.

![Image](https://github.com/user-attachments/assets/79b1ad8a-96cc-41e9-a1a0-7e84af6f2ca8)

---

**Question:**
How can I configure or guide the `Agent` to reliably attempt this multi-step tool usage (e.g., first list/query devices, then control a specific device by its ID)? Specifically, how do I ensure the `retries` parameter applies to each tool call within such a sequence made via the `MCPServerHTTP` before the agent gives up and asks the user for direct input like an `entity_id`?

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ensure Agent retries MCP server tool calls with retries #1679

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to ensure Agent retries MCP server tool calls with retries #1679

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions