Open
Description
There have been a bunch of feedback on pyroscope-rs. It may worth to investigate. specifically mem leak and signal handlers overhead.
- pprof-rs has a high overhead, since it's doing profiling being based on signal handlers (and pyroscope unregisters/registers the signal handlers every 10 seconds).
- libunwind mechanism is buggy and causes SIGABORTs: SIGABORT when profiling with pyroscope-rs tikv/pprof-rs#219
- pyroscope-rs repo does not seem under a lot of active maintenance. I also discovered a bug there: fix spurious exit when epoll_wait is interrupted by a signal grafana/pyroscope-rs#125
- even with frame pointer unwinding and the above fix, after a couple of hours of running the validator with pyroscope enabled, most of the nodes tasks are killed, including the pyroscope agent (I assume there's a memory leak that triggers an OOM somewhere), which leaves the node in an inconsistent but still running state.