Enable configuring containerd plugin runtime options #387

kate-goldenring · 2025-05-01T21:55:54Z

Describe your changes

Fixes #386 by enabling users to define runtime options for their shim in the Shim CRD. Ultimately, the node installer will inject them into the containerd config like the following to configure the SystemdCgroup.

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.spin.options]
  SystemdCgroup = true

The Spin shim currently accepts a cgroup configuration option like above. Other shims may accept other options. Therefore, this implementation supports setting a generic map of strings as options.

Draft as I am a first time contributor and the maintainers may want to take a different approach. I also have not tested this locally on a cluster yet. Want to see if I am on track first.

Issue ticket number and link

Fixes #386

Checklist before requesting a review

vdice

Allowing users/operators to configure shim-specific options on the Shim CRD seems like a great fit to me. The generic map[string]string approach seems fitting, since these are intended to be runtime-specific options.

config/samples/test_shim_spin.yaml

internal/controller/shim_controller.go

cmd/node-installer/uninstall.go

kate-goldenring · 2025-05-06T18:32:24Z

This PR leverages some of @vdice's changes from #389 to use busybox image and nsenter with commands

vdice

Quick re-review to also note the failing smoke test. I believe this is due to the RCM Helm chart under deploy/helm being managed manually -- so we'll need to update the chart's crd yaml with the new runtimeOptions config. That should be the only Helm chart update needed, right?

(side note: spotted this #390)

Programs/temp/rcm-kubeconfig

vdice · 2025-05-06T20:02:35Z

Is there anything we'd want to explicitly check in the case of the e2e helm install test regarding this feature? I'm aware of one higher-level piece of functionality that relies on this -- pod metrics for eg scaling -- but if there's some quick check we can add, that'd be great. Otherwise, if an automated check/test would be non-trivial and manual verification is the best option for now, fine to discuss in a follow-up.

kate-goldenring · 2025-05-07T19:53:58Z

Is there anything we'd want to explicitly check in the case of the e2e helm install test regarding this feature? I'm aware of one higher-level piece of functionality that relies on this -- pod metrics for eg scaling -- but if there's some quick check we can add, that'd be great. Otherwise, if an automated check/test would be non-trivial and manual verification is the best option for now, fine to discuss in a follow-up.

@vdice can you point me to where e2e testing is happening in CI? Checking that the containerd config is updated to contain a specific line is an option

vdice · 2025-05-07T20:00:14Z

@kate-goldenring here's the current e2e smoke test: https://github.com/spinframework/runtime-class-manager/blob/main/.github/workflows/helm-chart-smoketest.yml

kate-goldenring · 2025-05-07T20:04:24Z

Quick re-review to also note the failing smoke test... the new runtimeOptions config. That should be the only Helm chart update needed, right?

controller-gen handles updating the CRD for us when you run make manifests; however, the helm chart has another unmanaged CRD. We should find a way to point to the base CRD from the chart or add a Make directive to copy the newly minted CRD to the helm chart directory

vdice · 2025-05-09T02:06:14Z

internal/containerd/install_dbus.go

+// NOTE: this limits support to systems using systemctl to manage systemd
+func UsesSystemd() bool {
+	// Check if is a systemd system
+	cmd := nsenterCmd("systemctl", "is-system-running", "--quiet")


We might consider using a different command for this.

On the one hand, this current logic can be considered appropriate per the is-system-running cmd which will only exit 0 when systemd is fully up and running. On the other hand, systemd may be in other states (and this cmd will exit non-zero and fail) -- but systemd could still be considered as in use on the host. (I discovered this by pilfering the logic here for my PR and then testing on an AKS cluster, where systemd happened to be in a 'degraded' state, thus this function returned 'false'... I didn't dig into why my cluster's systemd was in this state, but it seemed to function fine otherwise.)

I ended up just using list-units, assuming systemctl itself would not be installed on a host that doesn't use systemd, though that might not be foolproof either...

vdice · 2025-05-09T16:45:18Z

It looks like the minikube variant in the smoke test is failing on the dbus install. Seeing these logs from the installer pod (from the debug step):

2025/05/07 20:09:20 INFO restarting containerd
2025/05/07 20:09:20 INFO installing D-Bus
2025/05/07 20:09:24 ERROR failed to install error="failed to restart containerd: failed to install D-Bus: failed to update apt: exit status 100"

kate-goldenring · 2025-05-09T19:42:24Z

It looks like the minikube variant in the smoke test is failing on the dbus install. Seeing these logs from the installer pod (from the debug step):
2025/05/07 20:09:20 INFO restarting containerd
2025/05/07 20:09:20 INFO installing D-Bus
2025/05/07 20:09:24 ERROR failed to install error="failed to restart containerd: failed to install D-Bus: failed to update apt: exit status 100"

@vdice we had this same issue in the spin shim due to expired SUSE keys. The workaround is to use a custom minikube image: spinframework/containerd-shim-spin@cc2e3de. Pushed up the fix

voigt · 2025-05-09T22:53:09Z

config/samples/test_shim_spin.yaml

@@ -17,6 +17,9 @@ spec:
    anonHttp:
      location: "https://github.com/spinframework/containerd-shim-spin/releases/download/v0.19.0/containerd-shim-spin-v2-linux-aarch64.tar.gz"

+  runtimeOptions:
+    SystemdCgroup: "true"


So far, RCM only supports containerD-based clusters. This is because the nodeinstaller's implementation is limited (for now). However, technically, the nodeinstaller could handle, e.g., OpenShift, which is based on cri-o.

Up until now, I think the shim spec was very container runtime agnostic. However, with the introduction of runtimeOption, I'm not sure.

I'm not a CRI-O expert, but I can imagine that it requires different configurations or a different configuration format.

My question: Do we need to introduce a type to specify which runtime this configuration is for?

I'm aware that this is very hypothetical, especially since the CRD is in an alpha state and we have never talked in depth about supporting open shift until now. But I'm nevertheless curious about your thoughts.

thats a good point. I could see a few options:

update this to containerdRuntimeOptions or something else containerd specific

in the future if we support a cri-o shim, we could add a field called runtimeType that has the option containerd or crio. This informs how to use runtimeOptions

We make a breaking change when the time comes

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

vdice · 2025-05-13T18:20:46Z

.github/workflows/helm-chart-smoketest.yml

-        with:
-          container-runtime: containerd
-          kubernetes-version: ${{ env.K8S_VERSION }}
+        run: |


Is there an upstream (minikube?) issue we'd monitor in anticipation of returning to the GH action in the future? (Or is this probably a long-term fix?)

kate-goldenring · 2025-05-14T17:59:40Z

@vdice it looks like the issue of the rcm pod restarting and erroring is popping up again:

rcm                  kind-control-plane-spin-v2-install-2ptm7     0/1     Completed           0          3m6s
rcm                  kind-control-plane-spin-v2-install-nnkvq     0/1     Error               0

vdice · 2025-05-14T18:32:59Z

@kate-goldenring can you capture logs (from the Error pod) and create an issue? I'm assuming the pod finishing with Error was first and then came the Completed pod? This may be a different issue from the Unknown behavior. Is the downloader container or the provisioner container failing?

vdice · 2025-05-14T18:39:26Z

@kate-goldenring oh sorry, I now realize that was from CI on this PR. Check out the debug step that runs when the test fails, it dumps relevant resource yaml and pod logs. We're seeing the following from the node-installer (provisioner container) pod:

2025/05/13 18:56:27 INFO shim installed shim=spin-v2 path=/opt/rcm/bin/containerd-shim-spin-v2 new-version=true
2025/05/13 18:56:27 INFO shim configured shim=spin-v2 path=/etc/containerd/config.toml
2025/05/13 18:56:27 INFO installing D-Bus
2025/05/13 18:56:27 ERROR failed to install error="failed to install D-Bus: failed to install D-Bus with apt-get: exit status 100"

I'm not sure why/how one of the pods finished with Completed, though 🤔

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

vdice reviewed May 1, 2025

View reviewed changes

config/samples/test_shim_spin.yaml Show resolved Hide resolved

internal/controller/shim_controller.go Show resolved Hide resolved

cmd/node-installer/uninstall.go Outdated Show resolved Hide resolved

kate-goldenring marked this pull request as ready for review May 6, 2025 18:31

vdice reviewed May 6, 2025

View reviewed changes

Programs/temp/rcm-kubeconfig Outdated Show resolved Hide resolved

kate-goldenring force-pushed the configure-runtime-options branch from 764326c to cfe6290 Compare May 7, 2025 19:52

kate-goldenring force-pushed the configure-runtime-options branch from b6d0fd7 to 500d68f Compare May 7, 2025 20:05

vdice reviewed May 9, 2025

View reviewed changes

voigt reviewed May 9, 2025

View reviewed changes

kate-goldenring added 6 commits May 13, 2025 11:18

Enable configuring containerd plugin runtime options

deabe3c

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Install and start D-Bus after configuring runtime options

37de5d3

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Add D-Bus installation to restarter logic

fea4225

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Move latest shim CRD to helm chart

0e6a30e

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Use custom minikube image

7bc0439

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Reduce complexity of install D-Bus function

4ddb0a1

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

kate-goldenring force-pushed the configure-runtime-options branch from fe6d153 to 4ddb0a1 Compare May 13, 2025 18:21

vdice reviewed May 13, 2025

View reviewed changes

kate-goldenring force-pushed the configure-runtime-options branch 2 times, most recently from 0480f80 to ac78c23 Compare May 13, 2025 18:53

Cleanup post rebase to remove redundant methods and install D-Bus

51e1176

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

kate-goldenring force-pushed the configure-runtime-options branch from ac78c23 to 51e1176 Compare May 14, 2025 22:28

Update Helm Shim CRD to latest

cfae831

Signed-off-by: Kate Goldenring <kate.goldenring@fermyon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable configuring containerd plugin runtime options #387

Enable configuring containerd plugin runtime options #387

kate-goldenring commented May 1, 2025 •

edited

Loading

vdice left a comment

kate-goldenring commented May 6, 2025

vdice left a comment •

edited

Loading

vdice commented May 6, 2025

kate-goldenring commented May 7, 2025

vdice commented May 7, 2025

kate-goldenring commented May 7, 2025

vdice May 9, 2025 •

edited

Loading

vdice commented May 9, 2025

kate-goldenring commented May 9, 2025

voigt May 9, 2025

kate-goldenring May 12, 2025

vdice May 13, 2025

kate-goldenring commented May 14, 2025

vdice commented May 14, 2025

vdice commented May 14, 2025 •

edited

Loading

Enable configuring containerd plugin runtime options #387

Are you sure you want to change the base?

Enable configuring containerd plugin runtime options #387

Conversation

kate-goldenring commented May 1, 2025 • edited Loading

Describe your changes

Issue ticket number and link

Checklist before requesting a review

vdice left a comment

Choose a reason for hiding this comment

kate-goldenring commented May 6, 2025

vdice left a comment • edited Loading

Choose a reason for hiding this comment

vdice commented May 6, 2025

kate-goldenring commented May 7, 2025

vdice commented May 7, 2025

kate-goldenring commented May 7, 2025

vdice May 9, 2025 • edited Loading

Choose a reason for hiding this comment

vdice commented May 9, 2025

kate-goldenring commented May 9, 2025

voigt May 9, 2025

Choose a reason for hiding this comment

kate-goldenring May 12, 2025

Choose a reason for hiding this comment

vdice May 13, 2025

Choose a reason for hiding this comment

kate-goldenring commented May 14, 2025

vdice commented May 14, 2025

vdice commented May 14, 2025 • edited Loading

kate-goldenring commented May 1, 2025 •

edited

Loading

vdice left a comment •

edited

Loading

vdice May 9, 2025 •

edited

Loading

vdice commented May 14, 2025 •

edited

Loading