Description
- Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.0
- Where do you run it - cloud or metal? Bare Metal K8s
- Are you running Postgres Operator in production? yes
- Type of issue? question
Hi,
we use the Postgres operator (v1.8.0) to manage 50 to 200 Postgres clusters per Kubernetes cluster and it’s working great. Thank you.
In our larger Kubernetes clusters however, it can take quite some time (5 – 20min) until a change of a PostgreSQL CR gets picked up and applied by the operator. This applies also to the creation of new databases.
I don’t know if this kind of behaviour is expected for one operator handling so many Postgres clusters or if it can be improved.
There are some things that we already tried and didn’t have much effect on the performance:
- Adding more resources to the Postgres Operator containers
(currently we use a limit of 2 CPUs and 500MiB memory, and the graphs in Prometheus don’t show that either CPU or memory are fully utilized) - Doubling the number of workers in the OperatorConfiguration from 8 to 16.
We also see some log messages in the operator that might indicate some performance loss on the Kubernetes API side, but that we can’t interpret properly at the moment. They more or less look all similar to:
I0907 13:40:40.505552 1 request.go:665] Waited for 1.197988032s due to client-side throttling, not priority and fairness, request: GET:https://xxx:443/api/v1/namespaces/yyy/serviceaccounts/postgres-pod
But this could also be a red herring.
If you have any pointers that you can share, this would be really helpful.
If not, that’s also totally fine too.
We really appreciate all the work that you put in this operator so far. Thanks.