Description
While working on netty/netty@b88a980 I noticed that when using the blocking stub and the netty transport, we allocate in Netty's allocator from within the shared executor threads. Netty's allocator employs thread local caches that perform the majority of allocations. These caches are accessed via Netty's own threadlocal implementation FastThreadLocal
, which is only "fast" when used in the context of a FastThreadLocalThread
.
We should provide our own thread factory to the cached thread pool in order to create FastThreadLocalThreads
. Additionally, we should override the Thread's run()
method to do
public void run() {
try {
super.run();
} finally {
FastThreadLocal.removeAll();
}
}
This will release all memory from a thread's cache when it is shutdown. That should happen quite frequently, with the pool sizing threads dynamically.
However, I don't know how to best implement this. The shared channel executor should be shared by all transport, client, server combinations running in a JVM. Also, I assume we can't rely on the Netty dependency to be there on Android?