Skip to content

[FIX] CNIEnv concurrency issue #4202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented May 6, 2025

This is (one of) the remaining issues in #3556

TL;DR: like in other places, we have been assuming that a filesystem is ACID and/or safe to use concurrently.
This has been solved generally by introducing the Store abstraction, providing a (somewhat (*)) ACID storage solution, and also ad-hoc fixed a bunch of CNIEnv methods.

CNIEnv should be rewritten generally, as it is baroque at this point, but this is a lot more work.
For now, this PR just isolates all methods touching the filesystem in an ad-hoc store file - hopefully to a similar result.

This should address some of the issues we have been seeing (eg: missing .conflist file).

(*) atomic writes are not guaranteed in case of operating system crash, and data is not guaranteed to be saved as we do not sync (yet).

@apostasie apostasie closed this May 6, 2025
@apostasie apostasie reopened this May 6, 2025
@apostasie apostasie changed the title [FIX] CNIEnv concurrency issue on NetworkByNameOrID [WIP] [FIX] CNIEnv concurrency issue on NetworkByNameOrID May 6, 2025
@apostasie apostasie force-pushed the 2025-05-cni-concurrency-issue branch from b5744e6 to 87fa976 Compare May 7, 2025 15:31
@apostasie apostasie changed the title [WIP] [FIX] CNIEnv concurrency issue on NetworkByNameOrID [WIP] [FIX] CNIEnv concurrency issue May 7, 2025
@apostasie
Copy link
Contributor Author

Damn it.

@apostasie apostasie force-pushed the 2025-05-cni-concurrency-issue branch from 87fa976 to a061618 Compare May 9, 2025 21:34
Signed-off-by: apostasie <spam_blackhole@farcloser.world>
@apostasie apostasie force-pushed the 2025-05-cni-concurrency-issue branch from a061618 to 38f07ca Compare May 9, 2025 21:40
@apostasie apostasie marked this pull request as ready for review May 9, 2025 22:27
@apostasie apostasie changed the title [WIP] [FIX] CNIEnv concurrency issue [FIX] CNIEnv concurrency issue May 9, 2025
@apostasie
Copy link
Contributor Author

apostasie commented May 9, 2025

@AkihiroSuda I am not exactly in love with any of this... at least it should fix #3556 which has been increasingly a PITA.

Touching CNIEnv gives me the creeps, as a mistake there would be very consequential for users, but I still feel we have to act on this.

Suggesting we merge this after 2.1.

@AkihiroSuda AkihiroSuda added this to the v2.1.0 milestone May 10, 2025
err = os.MkdirAll(cniNetconfPath, 0o700)
if err != nil {
return err
}
lock, err := lockutil.Lock(filepath.Join(cniNetconfPath, ".nerdctl.lock"))
lock, err := lockutil.Lock(filepath.Join(cniNetconfPath, ".cni-concurrency.lock"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.nerdctl.lock is used to lock cni conf file manipulation. This lock here is a global locking mechanism meant to prevent cni plugins from acting concurrently.

This lock is enforced for almost the entirety of the ocihook flow.
It was previously using the same lock, as we were not locking the file operations in cnienv that are used in oci-hooks.

Since we do now, it can no longer use the same lock.

The note around line 100 explains some of that.

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit a5547bc into containerd:main May 10, 2025
56 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants