toolcalling with vllm is broken

### Question

I am following the example in https://ai.pydantic.dev/examples/rag/#example-code

I am also using the latest version of vllm (0.8.3) with toolcalling support with  --guided_decoding_backend. 

The rag example doesnt work with any of models listed in https://docs.vllm.ai/en/stable/features/tool_calling.html#llama-models-llama3-json. Some models (llama3.2) correctly returns the tool call, but agent is unable to parse and make a call tool

I used llama3.1 https://docs.vllm.ai/en/latest/features/tool_calling.html#llama-models-llama3-json, the tool call happens repeatedly (never ending), so the token limit exceed, and crashes. 

However, using openai model works perfectly. Even ollama works (without streaming) 

### Additional Context

Pydantic AI version - pydantic-ai==0.0.43
Python - 3.11.0
vllm - 0.0.83 - https://github.com/vllm-project/vllm/pull/13483 (this doesnt fix the issue I am facing) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toolcalling with vllm is broken #1414

Question

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

toolcalling with vllm is broken #1414

Description

Question

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions