Purging queue in `hard-reset` mode does not cope with already-running tasks #52

acoulton · 2021-09-14T12:29:29Z

The naive sleep-based implementation I added to hard-delete tasks after they're removed turns out (as I worried it might) to be too naive. The panic I added as a sanity check triggers if a task is currently executing (dispatched but no response received) concurrently with the PurgeQueue operation.

This PR proves the issue & refactors the implementation to correctly wait for all tasks (running or not), or to return a timeout error.

The 'naive "sleep till it deletes" approach described above is too naive' turns out to be correct. Sleeping for a fixed period (of any length) still does not guarantee tasks are soft-deleted before we remove them from the list. If a task is currently executing when `PurgeQueue` is received, it isn't deleted until it gets a response. The naive implementation will consistenly panic in this scenario.

Instead of hoping that tasks will be cleared after a fixed period, use a try-sleep-retry loop to wait for the task to clear. If a long-running task has already been dispatched, `PurgeQueue` may fail with a DEADLINE_EXCEEDED error. Calling code can retry the method on a schedule to suit the application.

aertje

Cheers for the fix; a few comments, keen to hear what you think.

aertje · 2021-09-19T05:15:47Z

queue.go

+	// on a schedule to suit the application.
+	timeout := time.After(3 * time.Second)
+
+	go tryDeleteTasks()


Why is there a need for a go routines here? Since there is no asynchronous operation, and the intent is for the caller to wait anyway, I believe a simple loop with a time.Since(time.Now()) >= 3*time.Second can suffice.

Probably just my inexperience with go threading/async. I suspect you're right, I'll give it a go with a simpler loop.

aertje · 2021-09-19T05:20:25Z

queue.go

+	// It is intentionally relatively short, because the internal retry interval is rapid.
+	// If calling code expects some task requests to last longer, it should handle the DEADLINE_EXCEEDED error and retry
+	// on a schedule to suit the application.
+	timeout := time.After(3 * time.Second)


Should this be made configurable to avoid the complexity of having to handle this - especially since this is non-standard / undocumented behaviour?

The only thing is even if you want to wait longer, you probably don't want the emulator checking every 5ms for longer.

But it needs to be that fast at first, because with no tasks running I was finding updating the cancel channel etc could take 0-10ms so a longer retry interval would cause unnecessary delays.

I guess the better solution would be a basic exponential backoff. Could probably implement that easily enough with the simpler sync loop you suggested.

Then it would be fine to make this configurable - presumably as an extra CLI option?

Although it does then mean more parameters to validate, is it valid to set this option without enabling hard-reset mode etc. So makes the emulator/emulator interface a bit more complex...

Yeah I agree; tempted to say just keep it as you have it for now as it's non-core behaviour anyway. If there is a need I expect people will raise an issue and we can look at it then.

aertje · 2021-09-19T05:23:28Z

queue.go

+			}
+		case <-timeout:
+			log.Println("HardReset timed out waiting for tasks to clear")
+			return status.Errorf(codes.DeadlineExceeded, "Timed out waiting for tasks to be purged")


I feel this should be a custom error, and let the handler worry about the grpc response - what do you reckon?

Just to confirm, do you mean return a (non-grpc) error from the queue method and then let the emulator func convert that to the DEADLINE_EXCEEDED grpc response? That seems reasonable.

If you mean a custom GRPC response code, I'm not sure how to do that.

Ah sorry, yes I meant the former (non-grpc from the queue method).

acoulton added 2 commits September 14, 2021 13:22

aertje requested changes Sep 19, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purging queue in `hard-reset` mode does not cope with already-running tasks #52

Purging queue in `hard-reset` mode does not cope with already-running tasks #52

acoulton commented Sep 14, 2021

aertje left a comment

aertje Sep 19, 2021

acoulton Sep 19, 2021

aertje Sep 19, 2021

acoulton Sep 19, 2021

aertje Oct 10, 2021

aertje Sep 19, 2021

acoulton Sep 19, 2021

aertje Oct 10, 2021

Purging queue in hard-reset mode does not cope with already-running tasks #52

Are you sure you want to change the base?

Purging queue in hard-reset mode does not cope with already-running tasks #52

Conversation

acoulton commented Sep 14, 2021

aertje left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Purging queue in `hard-reset` mode does not cope with already-running tasks #52

Purging queue in `hard-reset` mode does not cope with already-running tasks #52