Skip to content

CNIEnv concurrency issues #3556

Closed
Closed
@apostasie

Description

@apostasie

Description

Although #3491 and #3522 have fixed a lot of cases where CNI would fail because of concurrent access, there are still cases where this happens.

Here, on container create - but very likely everywhere else we manipulate CNIEnv.

We can continue playing wack-a-mole on this and fix every occurrence piece-meal, though it seems like rewriting CNIEnv in a safe way would be a better approach at this point.

The fundamental problems are:

  • we rely on CNI implementation
    • not safe wrt concurrency, as it is walking dirs without a lock
    • writes are not atomic, leaving systems in broken / inconsistent states
  • we do lock in some places, but not everywhere, as this is an afterthought and not part of the design
  • we have complicated code, with private methods calling public ones, further complicated enforcing locking
  • we unnecessarily walk the directory repeatedly during the same flow
  • some of the logic currently in pkg/cmd should really be part of the methods of CNIEnv

Steps to reproduce the issue

FAIL: cmd/nerdctl/network TestNetworkCreate/with_MTU (0.17s)
    network_create_linux_test.go:108: ======================== Pre-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01
    network_create_linux_test.go:108: ======================== Test setup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network create testnetworkcreate-with-mtu-1b256b01 --driver bridge --opt com.docker.network.driver.mtu=9216
    network_create_linux_test.go:108: ======================== Test Run ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
    command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0
        
        Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test run --rm --net testnetworkcreate-with-mtu-1b256b01 ghcr.io/stargz-containers/alpine:3.13-org ifconfig eth0
        ExitCode: 1
        Error:    exit status 1
        Stdout:   
        Stderr:   time="2024-10-16T18:45:12Z" level=fatal msg="failed to verify networking settings: failed to check for default network: error reading /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: open /etc/cni/net.d/nerdctl-test/nerdctl-testnetworklsfilter-1-d946011b.conflist: no such file or directory"
        
        Env:
        HOSTNAME=dc5da5d26f5d
        MEMORY_PRESSURE_WRITE=c29tZSAyMDAwMDAgMjAwMDAwMAA=
        SYSTEMD_EXEC_PID=80
        container=docker
        HOME=/root
        LANG=C.UTF-8
        MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/system.slice/docker-entrypoint.service/memory.pressure
        INVOCATION_ID=3d1d502413d2454da8a8a340e78b0311
        TERM=xterm
        USER=root
        SHLVL=3
        CGO_ENABLED=0
        _=/usr/local/bin/gotestsum
        PATH=/usr/local/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        ***
        DOCKER_CONFIG=/tmp/TestNetworkCreatewith_MTU2150649351/001
        NERDCTL_TOML=/tmp/TestNetworkCreatewith_MTU2150649351/001/nerdctl.toml
    case.go:164: ======================== Post-test cleanup ========================
    command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test network rm testnetworkcreate-with-mtu-1b256b01

Describe the results you received and expected

https://github.com/containerd/nerdctl/actions/runs/11371804119/job/31634685012?pr=3555#step:6:1674

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions