Best practices for managing tokens in context with tool calling

### Question

Hello,

I'm implementing a chatbot which relies on tool calling to retrieve some data when the user asks for a relevant question. My challenge is that since a chat conversation can be multiple turns long, the tool data add up really quickly.

E.g.

1. User: "Tell me what invoices are due this week"
2. [Tool call retrieves raw data of invoices, stripped to essential info]
3. [LLM converts the tool call output into an answer]
4. User: "How about this month?"
5. [Repeat tool call / re-structuring]
6. User: "How about last 3 months?"
(etc.)

So initially I tried a naive method of removing stale tool call information e.g. in above, when I'm at step 6, the "tool call raw data" in step 2 isn't required, so I can purge it from message history. But it requires some extra 're-processing" of the pydantic output (e.g. if I remove the tool call data, I also need to remove the tool call ID reference, restructure the ModelMessage, etc.) 

But I'm not sure if that's the right way to go about it (since it feels I'm fighting "against" pydantic)?

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for managing tokens in context with tool calling #1495

Question

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best practices for managing tokens in context with tool calling #1495

Description

Question

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions