Replies: 2 comments
-
I'm not 100% sure I understand the situation, sorry if any of this misses the mark! Normally you'll only be in the RequiresInference state if you've prompted a token but not yet run inference. If you don't prompt that final token before deciding to cancel, you should be in the state you want (I think). If you're doing |
Beta Was this translation helpful? Give feedback.
-
I'm sorry, maybe it's unclear without some code. The BatchedExecutor is in a async inference loop, which I want to be able to cancel. Here is the loop: while (true)
{
// Run inference
var decodeResult = await executor.Infer(cancellationToken).ConfigureAwait(false);
if (decodeResult == DecodeResult.NoKvSlot)
{
throw new Exception("Out of memory");
}
// check if inference needs to continue
if (conversation.RequiresInference) continue;
var token = sampler.Sample(executor.Context.NativeHandle, conversation.GetSampleIndex());
if (token.IsEndOfGeneration(model.Vocab))
{
break;
}
decoder.Add(token);
string decoded = decoder.Read();
response.Content += decoded;
yield return new InferenceResult(decoded, AuthorRole.Assistant);
conversation.Prompt(token);
} I cancel the loop using the cancellation token that is passed to the infer function. However even if I try to continue inference after cancellation the BatchedExecutor will never leave the state RequiresInference while (conversation.RequiresInference)
{
await executor.Infer().ConfigureAwait(false);
} If I understand the first part of your response correctly your suggesting I should cancel the loop like this? while (true)
{
// Run inference
var decodeResult = await executor.Infer().ConfigureAwait(false);
if (decodeResult == DecodeResult.NoKvSlot)
{
throw new Exception("Out of memory");
}
// check if inference needs to continue
if (conversation.RequiresInference) continue;
var token = sampler.Sample(executor.Context.NativeHandle, conversation.GetSampleIndex());
if (token.IsEndOfGeneration(model.Vocab))
{
break;
}
decoder.Add(token);
string decoded = decoder.Read();
response.Content += decoded;
yield return new InferenceResult(decoded, AuthorRole.Assistant);
if (cancellationToken.IsCancellationRequested) break;
conversation.Prompt(token);
} And then continue with the next prompt? I wouldn't be adding the last generated token to the context and I'm unsure in what sort of internal state the BatchedExecutor is in. It feels icky, and it is also unclear why I can't use the cancellation token with the Infer function. And with Forking the conversation, looking at the code it will simply create a copy of the current conversation right? It will also copy the _requiredEpoch variable, which will fail when trying to prompt with RequiresInference => _requiredEpoch > Executor.Epoch; |
Beta Was this translation helpful? Give feedback.
-
I am using the BatchedExecutor and this is the workflow I'm trying to accomplish
The problem is that the state after cancellation is "Requires Inference [to finish Response A]" rather than "Ready for Prompt B".
I could dispose everything and rebuild the entire context before prompting B, but that would be slow and wasteful.
Another way would be to save the state after each generation. Then after a cancellation dispose the conversation and reload the saved state so only the previous prompt, it's cancelled partial-answer and the new prompt have to be added before starting inference. Which is better as the previous method but requires saving the state after each generation. This could potentially become slow and big (?).
So my question is, is there a better way to get out of this RequiresInference state after cancellation that allows me to add a new prompt?
Beta Was this translation helpful? Give feedback.
All reactions