Open
Description
What version of gRPC-Java are you using?
1.44.0
What is your environment?
Ubuntu 22.04
What did you expect to see?
An UNAVAILABLE status code or something similar
What did you see instead?
We saw things fail with this exception and stacktrace:
io.grpc.StatusRuntimeException: UNKNOWN: channel closed
at io.grpc.Status.asRuntimeException(Status.java:535)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479)
at io.opentelemetry.instrumentation.grpc.v1_6.TracingClientInterceptor$TracingClientCall$TracingClientCallListener.onClose(TracingClientInterceptor.java:161)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at com.daml.executors.QueueAwareExecutorService$TrackingRunnable.run(QueueAwareExecutorService.scala:98)
at com.daml.metrics.InstrumentedExecutorServiceMetrics$InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorServiceMetrics.scala:202)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.nio.channels.ClosedChannelException: null
at io.grpc.netty.Utils.statusFromThrowable(Utils.java:271)
at io.grpc.netty.NettyClientHandler.onConnectionError(NettyClientHandler.java:500)
at io.netty.handler.codec.http2.Http2ConnectionHandler.onError(Http2ConnectionHandler.java:641)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionEncoder.writeHeaders0(DefaultHttp2ConnectionEncoder.java:251)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionEncoder.writeHeaders(DefaultHttp2ConnectionEncoder.java:167)
at io.netty.handler.codec.http2.DecoratingHttp2FrameWriter.writeHeaders(DecoratingHttp2FrameWriter.java:53)
at io.netty.handler.codec.http2.StreamBufferingEncoder.writeHeaders(StreamBufferingEncoder.java:170)
at io.netty.handler.codec.http2.StreamBufferingEncoder.writeHeaders(StreamBufferingEncoder.java:158)
at io.grpc.netty.NettyClientHandler.createStreamTraced(NettyClientHandler.java:609)
at io.grpc.netty.NettyClientHandler.createStream(NettyClientHandler.java:592)
at io.grpc.netty.NettyClientHandler.write(NettyClientHandler.java:326)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856)
at io.netty.channel.DefaultChannelPipeline.write(DefaultChannelPipeline.java:1015)
at io.netty.channel.AbstractChannel.write(AbstractChannel.java:301)
at io.grpc.netty.WriteQueue$AbstractQueuedCommand.run(WriteQueue.java:213)
at io.grpc.netty.WriteQueue.flush(WriteQueue.java:128)
at io.grpc.netty.WriteQueue.access$000(WriteQueue.java:34)
at io.grpc.netty.WriteQueue$1.run(WriteQueue.java:46)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.netty.channel.StacklessClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)
Interestingly, it does look like the channel recovered from this after the connection established again.
Steps to reproduce the bug
In our test setup, we kill the connection with toxiproxy and then see this failure but only relatively rarely. I don't have a reliable reproduce unfortunately (nor one that I can make public).
Is that expected? Given that it recovers should we just retry on UNKNOWN: channel closed
like we do on an UNAVAILABLE?