Description
After updating https://github.com/spring-projects/spring-lifecycle-smoke-tests to run tests against Spring Boot 3.4.x, I have noticed that framework:webflux-undertow:checkpointRestoreAppTest
is broken with Boot 3.4.x while still green with Boot 3.3.x, even if both are using the same Undertow version with the following error:
Error (criu/libnetlink.c:54): -95 reported by netlink: Operation not supported
Error (criu/net.c:3744): Unable to create a veth pair: -95
While discussing with @snicoll about what could caused that, he mentioned that Spring Boot 3.4.x enables graceful shutdown by default, so I tried server.shutdown=immediate
and found that it fixes the test.
Could the Spring Boot team see if we could avoid this regression and keep WebFlux + Undertow CRaC support working out of the box? I suspect that when graceful shutdown is enabled, it is not finished when JVM checkpoint is invoked, letting the socket in a bad state, hence the error above.
Activity
wilkinsona commentedon Jan 3, 2025
This doesn't look like a regression to me as it also fails (although perhaps differently) with Boot 3.3.x when graceful shutdown is enabled:
[-]Regression on WebFlux + Undertow with Project CRaC[/-][+]Checkpoint-restore with WebFlux and Undertow does not work when graceful shutdown is enabled[/+]wilkinsona commentedon Jan 3, 2025
With Boot 3.4.1, I'm seeing the same behavior as Boot 3.3.x when graceful shutdown is enabled. The checkpoint works, the app starts successfully upon restore, and then rejects requests with a 503. This happens because Undertow's
GracefulShutdownHandler
is only single-use. Once it has been shut down (as happens when taking the checkpoint) theshutdown
bit is set in itsstate
field. The bit isn't cleared upon restore so the handler still believes that Undertow has been shut down. There's no API to clear it so we may have to resort to reflection if this is something that we want to support. Alternatively, it might be possible to ignore the handler somehow when taking a checkpoint so that it isn't shut down.sdeleuze commentedon Jan 6, 2025
For the automatic checkpoint/restore at startup use case where
-Dspring.context.checkpoint=onRefresh
is set, graceful shutdown is IMO not needed (for any webserver) since no request is expected to have been received, so if you can disable it (for Undertow or all servers) for that use case specifically, that would make sense. Spring Boot can leverageDefaultLifecycleProcessor#CHECKPOINT_PROPERTY_NAME
andDefaultLifecycleProcessor#ON_REFRESH_VALUE
.For the on-demand checkpoint/restore of a running application, I think graceful shutdown makes more sense, so maybe I could create a related
GracefulShutdownHandler
feature request on Undertow bug tracker and for now we just document in https://github.com/spring-projects/spring-lifecycle-smoke-tests that people using Undertow + CRaC + on-demand checkpoint/restore should disable graceful shutdown?