Skip to content

How to ensure Agent retries MCP server tool calls with retries #1679

Open
@xy3xy3

Description

@xy3xy3

I'm using pydantic-ai with an MCPServerHTTP for a smart home assistant. I want the agent to perform a two-step process for control tasks:

  1. First, call a tool to query/list available smart home devices.
  2. Then, use the information from step 1 (specifically the entity_id) to call another tool to control the target device.

I've set retries=3 on the Agent, expecting that if any part of this process (especially the tool calls via MCP server) fails, it would be retried.

However, when I give a command like "Turn off the monitor light bar," the agent seems to skip the device listing/identification step and immediately asks me for the entity_ID. This suggests it's not attempting the desired two-step process or retrying the initial discovery phase.

If the user's query was just about device status (e.g., "Is the light on?"), I'd expect it to perform the query step but not necessarily proceed to a control step. The current issue is about the control scenario where discovery should precede action.

Code (Python):

import asyncio
import os
import sys
from pydantic_ai.mcp import MCPServerHTTP
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from typing import List # Removed Union as it wasn't used

# Dummy config values for reproducibility
class DummyConfig:
    def get(self, key):
        if key == 'ha_auth_token':
            return "DUMMY_HA_AUTH_TOKEN"
        if key == 'ha_server_url':
            return "http://localhost:8123/api/" # Replace with your HA server URL
        if key == 'sf_api_key':
            return "DUMMY_SF_API_KEY"
        if key == 'sf_base_url':
            return "DUMMY_SF_BASE_URL" # Replace with your LLM provider base URL
        return None

config = DummyConfig()

headers = {
    "Authorization": f"Bearer {config.get('ha_auth_token')}",
}
server = MCPServerHTTP(url=config.get('ha_server_url'), headers=headers)

system_prompt_en = """You are a smart home assistant designed to help users control smart home devices.
When a user asks you to turn a device on or off:
1. First, call the tool to query the device list and entities to determine which device the user intends to control. You can judge based on name similarity.
2. When calling the interface to switch the device and passing arguments:
   Prioritize using domain, device_class, and entity_id to locate the device, rather than name.
   Correct example, using entity_id:
     <arguments>{
       "domain": ["switch"],
       "device_class": ["switch"],
       "entity_id": "switch.monitor_light_bar_switch"
     }</arguments>
3. When the device switching interface returns "isError": false, it means the switch was successful.
"""

agent: Agent = Agent(
    model=OpenAIModel(
        model_name="THUDM/GLM-4-9B-0414", # Example model
        provider=OpenAIProvider(api_key=config.get('sf_api_key'), base_url=config.get('sf_base_url'))
    ),
    mcp_servers=[server],
    system_prompt=system_prompt_en,
    retries=3,
    output_retries=3
)

async def main():
    async with agent.run_mcp_servers():
        result = await agent.run('Turn off the monitor light bar.')
        print(result)
        print(f"Agent Output: {result.output}")

if __name__ == '__main__':
    asyncio.run(main())

Agent Output (Translated):

AgentRunResult(output='\nSorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.')

Agent Output: 
Sorry, it seems there was a technical issue while trying to turn off the monitor light bar. To better assist you, please tell me the exact name or entity ID of the monitor light bar so I can assist you directly. If possible, you can find the relevant name in your smart home device list and provide it to me.

Expected Behavior:
For a command like "Turn off the monitor light bar":

  1. Agent attempts to call a tool (via MCPServerHTTP) to list/query devices to find the "monitor light bar".
  2. If this initial tool call fails or doesn't yield enough info, it should be retried (due to retries=3).
  3. Once the device is identified (e.g., entity_id: "switch.monitor_light_bar_switch" is found), the agent attempts a second tool call to control this specific entity_id.
  4. This second tool call should also be retried upon failure.
  5. The agent should only ask the user for an entity_id if the entire multi-step process (including retries for each tool call) genuinely fails to identify or control the device.

Here is an image for Agent to use tools twice in Cherry Studio.

Image

Another example to ask the tempature,only use tool once.

Image


Question:
How can I configure or guide the Agent to reliably attempt this multi-step tool usage (e.g., first list/query devices, then control a specific device by its ID)? Specifically, how do I ensure the retries parameter applies to each tool call within such a sequence made via the MCPServerHTTP before the agent gives up and asks the user for direct input like an entity_id?

Thanks!

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions