-
Notifications
You must be signed in to change notification settings - Fork 57
Purging queue in hard-reset
mode does not cope with already-running tasks
#52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Purging queue in hard-reset
mode does not cope with already-running tasks
#52
Conversation
The 'naive "sleep till it deletes" approach described above is too naive' turns out to be correct. Sleeping for a fixed period (of any length) still does not guarantee tasks are soft-deleted before we remove them from the list. If a task is currently executing when `PurgeQueue` is received, it isn't deleted until it gets a response. The naive implementation will consistenly panic in this scenario.
Instead of hoping that tasks will be cleared after a fixed period, use a try-sleep-retry loop to wait for the task to clear. If a long-running task has already been dispatched, `PurgeQueue` may fail with a DEADLINE_EXCEEDED error. Calling code can retry the method on a schedule to suit the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers for the fix; a few comments, keen to hear what you think.
// on a schedule to suit the application. | ||
timeout := time.After(3 * time.Second) | ||
|
||
go tryDeleteTasks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for a go routines here? Since there is no asynchronous operation, and the intent is for the caller to wait anyway, I believe a simple loop with a time.Since(time.Now()) >= 3*time.Second
can suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just my inexperience with go threading/async. I suspect you're right, I'll give it a go with a simpler loop.
// It is intentionally relatively short, because the internal retry interval is rapid. | ||
// If calling code expects some task requests to last longer, it should handle the DEADLINE_EXCEEDED error and retry | ||
// on a schedule to suit the application. | ||
timeout := time.After(3 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be made configurable to avoid the complexity of having to handle this - especially since this is non-standard / undocumented behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing is even if you want to wait longer, you probably don't want the emulator checking every 5ms for longer.
But it needs to be that fast at first, because with no tasks running I was finding updating the cancel
channel etc could take 0-10ms so a longer retry interval would cause unnecessary delays.
I guess the better solution would be a basic exponential backoff. Could probably implement that easily enough with the simpler sync loop you suggested.
Then it would be fine to make this configurable - presumably as an extra CLI option?
Although it does then mean more parameters to validate, is it valid to set this option without enabling hard-reset mode etc. So makes the emulator/emulator interface a bit more complex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree; tempted to say just keep it as you have it for now as it's non-core behaviour anyway. If there is a need I expect people will raise an issue and we can look at it then.
} | ||
case <-timeout: | ||
log.Println("HardReset timed out waiting for tasks to clear") | ||
return status.Errorf(codes.DeadlineExceeded, "Timed out waiting for tasks to be purged") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this should be a custom error, and let the handler worry about the grpc response - what do you reckon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, do you mean return a (non-grpc) error from the queue method and then let the emulator func convert that to the DEADLINE_EXCEEDED
grpc response? That seems reasonable.
If you mean a custom GRPC response code, I'm not sure how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, yes I meant the former (non-grpc from the queue method).
The naive sleep-based implementation I added to hard-delete tasks after they're removed turns out (as I worried it might) to be too naive. The
panic
I added as a sanity check triggers if a task is currently executing (dispatched but no response received) concurrently with the PurgeQueue operation.This PR proves the issue & refactors the implementation to correctly wait for all tasks (running or not), or to return a timeout error.