Skip to content

helm chart does not flush wal on scaling down singleBinary #17087

Open
@taoyouh

Description

@taoyouh

Describe the bug
The helm chart doesn't have preStop lifecycle hooks to flush wal on shutdown for singleBinary, thus singleBinary does not flush data to object store when reducing number of replicas. Unlike the microservice pods, the singleBinary pods doesn't have lifecycle defined in helm templates or values.yaml.

Given the fact that the enableStatefulSetAutoDeletePVC is set to true, it's quite dangerous not to have prestop lifecycle hook for flushing or flush-on-shutdown enabled. If one manually scale down singleBinary using helm, the non-flushed wal will be gone with its deleted PVC. This is quite confusing and error-prone.

For the scalable or microservice pods, by default they only enable the lifecycle hooks when autoscaling is enabled. This config could also lead to data loss when one manually scales down the deployment.

The following is in write-statefulset.yaml but not in single-binary/stateful.yaml:

          lifecycle:
            {{- toYaml .Values.write.lifecycle | nindent 12 }}
          {{- else if .Values.write.autoscaling.enabled }}
          lifecycle:
            preStop:
              httpGet:
                path: "/ingester/shutdown?terminate=false"
                port: http-metrics

To Reproduce
Steps to reproduce the behavior:

  1. Install helm chart 6.29.0 with singleBinary replicas set to 3
  2. Have some logs pushed to loki
  3. Update helm chart to reduce replicas to 1
  4. Query for the logs and you'll find some of the logs are gone

Expected behavior
All logs shall be persistent.

Environment:

  • Infrastructure: kubernetes (k3s v1.31.6+k3s1 (6ab750f9))
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
My values.yaml:

deploymentMode: SingleBinary
singleBinary:
  replicas: 1
  extraArgs:
  - -config.expand-env
  extraEnvFrom:
  - secretRef:
      name: loki
loki:
  commonConfig:
    replication_factor: 1
  storage:
    bucketNames:
      ...
    use_thanos_objstore: true
    object_store:
      type: s3
      s3:
        ...
  compactor:
    retention_enabled: true
    delete_request_store: s3
  ingester:
    wal:
      flush_on_shutdown: true
  limits_config:
    retention_period: 200d
  schemaConfig:
    configs:
    - from: 2022-01-11
      store: boltdb-shipper
      object_store: s3
      schema: v12
      index:
        prefix: loki_index_
        period: 24h
    - from: 2024-06-17
      store: boltdb-shipper
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
    - from: 2024-06-18
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
  storage_config:
    use_thanos_objstore: true
lokiCanary:
  extraArgs:
  - -interval
  - 10s
  - -spot-check-query-rate
  - 10m
test:
  enabled: false
gateway:
  affinity:
    podAntiAffinity: null
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0
chunksCache:
  enabled: false
resultsCache:
  enabled: false
sidecar:
  rules:
    enabled: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/helmtype/bugSomehing is not working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions