Files
coredns/plugin/pkg/proxy/proxy.go
Ville Vesilehto f3983c1111 perf(proxy): use mutex-based connection pool (#7790)
* perf(proxy): use mutex-based connection pool

The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).

Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
  connection slice, allowing for fast concurrent access without
  context switching.
- Downgraded connManager to a simple background cleanup loop that
  only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
  instead of channel sends.
- Updated tests to reflect the removal of internal channels.

Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.

The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: address PR review for persistent.go

- Named mutex field instead of embedding, to not expose
  Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: address code review feedback for conn pool

- Switch from LIFO to FIFO connection selection for source port
  diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
  FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
  while holding lock, close synchronously after releasing lock.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: remove unused error consts

No longer utilised after refactoring the channel based approach.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* feat(forward): add max_idle_conns option

Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.

Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option

By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.

Also add a yield related test.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* chore(proxy): simple Dial by closing conns inline

Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* chore: fewer explicit Unlock calls

Cleaner and less chance of forgetting to unlock on new possible
code paths.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

---------

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
2026-01-13 17:49:46 -08:00

120 lines
2.7 KiB
Go

package proxy
import (
"crypto/tls"
"runtime"
"sync/atomic"
"time"
"github.com/coredns/coredns/plugin/pkg/log"
"github.com/coredns/coredns/plugin/pkg/up"
)
// Proxy defines an upstream host.
type Proxy struct {
fails uint32
addr string
proxyName string
transport *Transport
readTimeout time.Duration
// health checking
probe *up.Probe
health HealthChecker
}
// NewProxy returns a new proxy.
func NewProxy(proxyName, addr, trans string) *Proxy {
p := &Proxy{
addr: addr,
fails: 0,
probe: up.New(),
readTimeout: 2 * time.Second,
transport: newTransport(proxyName, addr),
health: NewHealthChecker(proxyName, trans, true, "."),
proxyName: proxyName,
}
runtime.SetFinalizer(p, (*Proxy).finalizer)
return p
}
func (p *Proxy) Addr() string { return p.addr }
// SetTLSConfig sets the TLS config in the lower p.transport and in the healthchecking client.
func (p *Proxy) SetTLSConfig(cfg *tls.Config) {
p.transport.SetTLSConfig(cfg)
p.health.SetTLSConfig(cfg)
}
// SetExpire sets the expire duration in the lower p.transport.
func (p *Proxy) SetExpire(expire time.Duration) { p.transport.SetExpire(expire) }
// SetMaxIdleConns sets the maximum idle connections per transport type.
// A value of 0 means unlimited (default).
func (p *Proxy) SetMaxIdleConns(n int) { p.transport.SetMaxIdleConns(n) }
func (p *Proxy) GetHealthchecker() HealthChecker {
return p.health
}
func (p *Proxy) GetTransport() *Transport {
return p.transport
}
func (p *Proxy) Fails() uint32 {
return atomic.LoadUint32(&p.fails)
}
// Healthcheck kicks of a round of health checks for this proxy.
func (p *Proxy) Healthcheck() {
if p.health == nil {
log.Warning("No healthchecker")
return
}
p.probe.Do(func() error {
return p.health.Check(p)
})
}
// Down returns true if this proxy is down, i.e. has *more* fails than maxfails.
func (p *Proxy) Down(maxfails uint32) bool {
if maxfails == 0 {
return false
}
fails := atomic.LoadUint32(&p.fails)
return fails > maxfails
}
// Stop close stops the health checking goroutine.
func (p *Proxy) Stop() { p.probe.Stop() }
func (p *Proxy) finalizer() { p.transport.Stop() }
// Start starts the proxy's healthchecking.
func (p *Proxy) Start(duration time.Duration) {
p.probe.Start(duration)
p.transport.Start()
}
func (p *Proxy) SetReadTimeout(duration time.Duration) {
p.readTimeout = duration
}
// incrementFails increments the number of fails safely.
func (p *Proxy) incrementFails() {
curVal := atomic.LoadUint32(&p.fails)
if curVal > curVal+1 {
// overflow occurred, do not update the counter again
return
}
atomic.AddUint32(&p.fails, 1)
}
const (
maxTimeout = 2 * time.Second
)