Closed
Description
Description
Although #3491 and #3522 have fixed a lot of cases where CNI would fail because of concurrent access, there are still cases where this happens.
Here, on container create - but very likely everywhere else we manipulate CNIEnv.
We can continue playing wack-a-mole on this and fix every occurrence piece-meal, though it seems like rewriting CNIEnv in a safe way would be a better approach at this point.
The fundamental problems are:
- we rely on CNI implementation
- not safe wrt concurrency, as it is walking dirs without a lock
- writes are not atomic, leaving systems in broken / inconsistent states
- we do lock in some places, but not everywhere, as this is an afterthought and not part of the design
- we have complicated code, with private methods calling public ones, further complicated enforcing locking
- we unnecessarily walk the directory repeatedly during the same flow
- some of the logic currently in pkg/cmd should really be part of the methods of CNIEnv
Steps to reproduce the issue
FAIL: cmd/nerdctl/network TestNetworkCreate/with_MTU (0.17s)
network_create_linux_test.go:108: ======================== Pre-test cleanup ========================
command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01
network_create_linux_test.go:108: ======================== Test setup ========================
command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworkcreate-with-mtu-1b256b01 --driver bridge --opt com.docker.network.driver.mtu=9216
network_create_linux_test.go:108: ======================== Test Run ========================
command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0
Command: /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
ExitCode: 1
Error: exit status 1
Stdout:
Stderr: time="2024-10-16T18:45:12Z" level=fatal msg="failed to verify networking settings: failed to check for default network: error reading /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: open /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: no such file or directory"
Env:
HOSTNAME=dc5da5d26f5d
MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
SYSTEMD_EXEC_PID=80
container=docker
HOME=/root
LANG=C.UTF-8
MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/system.slice/docker-entrypoint.service/memory.pressure
INVOCATION_ID=3d1d502413d2454da8a8a340e78b0311
TERM=xterm
USER=root
SHLVL=3
CGO_ENABLED=0
_=/usr/local/bin/gotestsum
PATH=/usr/local/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
***
DOCKER_CONFIG=/tmp/TestNetworkCreatewith_MTU2150649351/001
NERDCTL_TOML=/tmp/TestNetworkCreatewith_MTU2150649351/001/nerdctl.toml
case.go:164: ======================== Post-test cleanup ========================
command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01
Describe the results you received and expected
https://github.com/containerd/nerdctl/actions/runs/11371804119/job/31634685012?pr=3555#step:6:1674
What version of nerdctl are you using?
main
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
No response