Description
What version of gRPC-Java are you using?
1.26.0
What is your environment?
The problem is 100% reproducible on CentOS and macOS.
What did you expect to see?
On the server side it would be good to see a different reason for the cancelled call in the StatusRuntimeExcpetion
e.g. "CANCELLED - client reset stream" or even another Status entirely that makes it easy to differentiate.
Additionally, either no DATA
frame should be sent by the client after a RST_STREAM
or the NettyServerHandler
should ignore it when receiving it directly after the RST_STREAM
.
What did you see instead?
A StatusRuntimeException
with message CANCELLED: cancelled before receiving half close
is delivered to the server's StreamObserver
and followed by a:
io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA frame for an unknown stream 3
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:147) ~[grpc-netty-shaded-1.26.0.jar:1.26.0]
Steps to reproduce the bug
Both client and server use grpc-java
. We have a bidirectional call, after the stream is established the client's onNext
throws an exception. Therefore grpc-java
sends a RST_STREAM
followed by a DATA
frame to the server.
In the server logs this can be observed with:
2019-12-19 15:45:17,819 DEBUG | grpc-default-worker-ELG-3-1 | io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler | [id: 0x39a99083, L:/<REDACTED>- R:/<REDACTED>] INBOUND RST_STREAM: streamId=3 errorCode=8
2019-12-19 15:45:17,822 DEBUG | grpc-default-worker-ELG-3-1 | io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler | [id: 0x39a99083, L:/<REDACTED> - R:/<REDACTED>] INBOUND DATA: streamId=3 padding=0 endStream=true length=0 bytes=
and then the exception:
io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA frame for an unknown stream 3
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:147) ~[grpc-netty-shaded-1.26.0.jar:1.26.0]
...
What makes the problem hard to diagnose on the server side:
The way the problem presents itself could be a server problem, as you could argue that the server did close the stream and therefore data from the client could be interpreted as a racy frame. This is the reason we spent a few hours debugging it on the server side even though the problem originated on the client side (that my team doesn't have any control over).
Another thing is that we had to deactivate WARN logging on the server side for grpc because these exception traces trashed the log files. Generally we like to observe exceptions logged on WARN because we detected some bugs in our application that way.