Skip to content

gRPC Keepalive Behavior in Serverless Environments #2664

Open
@pratik151192

Description

@pratik151192

(This ticket is more for documentation and educational purposes; and we aren't requesting a bug fix or new feature per se)

Background:

When deploying gRPC services on serverless platforms such as AWS Lambda (and potentially Google/Azure Cloud Functions), we have observed a specific behavior related to gRPC keepalives that can lead to unexpected timeouts. Serverless platforms often reuse containers/instances for multiple invocations of the same function, with invocations guaranteed to be sequential. There can be variable pauses between these invocations depending on the incoming traffic to a container.

Issue:

We have noticed that gRPC keepalive pings can timeout in these serverless environments. Specifically, the issue manifests as follows:

  • The first invocation of a function sends a keepalive ping and completes execution.
  • If a second invocation occurs after the keepalive timeout interval, the system declares the keepalive to have timed out, resulting in errors such as:
keepalive | (4) 54.xxx.xxx.xx:443 Ping timeout passed without response

This behavior appears to be linked to the unique operational dynamics of serverless platforms, where the idle time between function invocations does not align with the expected keepalive intervals. The primary goal of this ticket is to update any relevant documentation that can provide clarity and guidance to developers deploying gRPC services in serverless environments.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions