Skip to content

Fix the issue in getOrQueueForIdleConn where connectionPerished conn.… #2051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

linfeip
Copy link

@linfeip linfeip commented May 14, 2025

…isAlive involves network requests, which can cause the p.idleMu.Lock() when obtaining conn from the pool to be blocked for too long.

Summary

the lock in getOrQueueForIdleConn will cause connection acquisition to block under high concurrency due to the IO request in c.nc.Read(b[:]) inside isAlive.

func (p *pool) getOrQueueForIdleConn(w *wantConn) bool {
        p.idleMu.Lock()
	defer p.idleMu.Unlock()

	// Try to deliver an idle connection from the idleConns stack first.
	for len(p.idleConns) > 0 {
		conn := p.idleConns[len(p.idleConns)-1]
		p.idleConns = p.idleConns[:len(p.idleConns)-1]

		if conn == nil {
			continue
		}

		if reason, perished := connectionPerished(conn); perished {
     ...
}

func (c *connection) isAlive() bool {
	if c.nc == nil {
		return false
	}
      ...
	err := c.nc.SetReadDeadline(time.Now().Add(1 * time.Millisecond))
	if err != nil {
		return false
	}
	var b [1]byte
	_, err = c.nc.Read(b[:])
	return errors.Is(err, os.ErrDeadlineExceeded)
}

Background & Motivation

Before modifying isAlive

image
image
756dcec288ac87b30a40a38327d9c1b4

After

image
image

…isAlive involves network requests, which can cause the p.idleMu.Lock() when obtaining conn from the pool to be blocked for too long.
@linfeip linfeip requested a review from a team as a code owner May 14, 2025 08:35
@linfeip linfeip requested a review from prestonvasquez May 14, 2025 08:35
@linfeip
Copy link
Author

linfeip commented May 14, 2025

I think it is unnecessary to verify whether this conn is alive here, because even if the connection is alive when it is obtained, there is no guarantee that the conn will still be alive when it is passed to the caller. The connection may have been disconnected between these two steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant