Skip to content

toolcalling with vllm is broken #1414

Open
@suresh-now

Description

@suresh-now

Question

I am following the example in https://ai.pydantic.dev/examples/rag/#example-code

I am also using the latest version of vllm (0.8.3) with toolcalling support with --guided_decoding_backend.

The rag example doesnt work with any of models listed in https://docs.vllm.ai/en/stable/features/tool_calling.html#llama-models-llama3-json. Some models (llama3.2) correctly returns the tool call, but agent is unable to parse and make a call tool

I used llama3.1 https://docs.vllm.ai/en/latest/features/tool_calling.html#llama-models-llama3-json, the tool call happens repeatedly (never ending), so the token limit exceed, and crashes.

However, using openai model works perfectly. Even ollama works (without streaming)

Additional Context

Pydantic AI version - pydantic-ai==0.0.43
Python - 3.11.0
vllm - 0.0.83 - vllm-project/vllm#13483 (this doesnt fix the issue I am facing)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions