gRPC Keepalive Behavior in Serverless Environments

(This ticket is more for documentation and educational purposes; and we aren't requesting a bug fix or new feature per se)

### Background:
When deploying gRPC services on serverless platforms such as AWS Lambda (and potentially Google/Azure Cloud Functions), we have observed a specific behavior related to gRPC keepalives that can lead to unexpected timeouts. Serverless platforms often reuse containers/instances for multiple invocations of the same function, with invocations guaranteed to be sequential. There can be variable pauses between these invocations depending on the incoming traffic to a container.

### Issue:
We have noticed that gRPC keepalive pings can timeout in these serverless environments. Specifically, the issue manifests as follows:

- The first invocation of a function sends a keepalive ping and completes execution.
- If a second invocation occurs after the keepalive timeout interval, the system declares the keepalive to have timed out, resulting in errors such as:

```
keepalive | (4) 54.xxx.xxx.xx:443 Ping timeout passed without response
```

This behavior appears to be linked to the unique operational dynamics of serverless platforms, where the idle time between function invocations does not align with the expected keepalive intervals. The primary goal of this ticket is to update any relevant documentation that can provide clarity and guidance to developers deploying gRPC services in serverless environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC Keepalive Behavior in Serverless Environments #2664

Background:

Issue:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gRPC Keepalive Behavior in Serverless Environments #2664

Description

Background:

Issue:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions