* perf(proxy): use mutex-based connection pool
The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).
Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
connection slice, allowing for fast concurrent access without
context switching.
- Downgraded connManager to a simple background cleanup loop that
only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
instead of channel sends.
- Updated tests to reflect the removal of internal channels.
Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.
The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address PR review for persistent.go
- Named mutex field instead of embedding, to not expose
Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address code review feedback for conn pool
- Switch from LIFO to FIFO connection selection for source port
diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
while holding lock, close synchronously after releasing lock.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: remove unused error consts
No longer utilised after refactoring the channel based approach.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* feat(forward): add max_idle_conns option
Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.
Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option
By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.
Also add a yield related test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore(proxy): simple Dial by closing conns inline
Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore: fewer explicit Unlock calls
Cleaner and less chance of forgetting to unlock on new possible
code paths.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Fixes a bug in the forward plugin where an immediate connection
failure (e.g., TCP RST) could trigger an infinite busy loop. The
retry logic failed to increment the "fails" counter when a
connection error occurred, causing the loop condition to
remain permanently true. This patch fixes it and adds a
regression test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Previously the parsing logic in the forward plugin setup failed to
recognise when NOERROR was used as a failover RCODE criteria. The
check was in the wrong code branch. This PR fixes it and adds
validation tests. Also updates the plugin README.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
In CI, the first two upstream attempts can stall on UDP and each
consume the default 2s read timeout. Possibly exhausting most of
the 5s forward deadline before the healthy third upstream is tried.
Lower the read timeout to make retries faster.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable nilness linter in govet.
Plugin-by-plugin rationale:
- plugin/transfer: reuse error instead of shadowing it inside the for
loop by declaring "ret" outside of the loop
- plugin/view: remove redundant err check
- plugin/dnstap: avoid possible nil dereference in error reporting
path in setup test
- plugin/forward: prevent nil deference or empty-slice dereference on
error paths in setup test
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Replace manual host:port parsing using net.SplitHostPort +
strconv.ParseUint with the standard library net/netip function
ParseAddrPort. This eliminates integer conversion warnings and
improves type safety.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable intrange linter to enforce modern Go range syntax over
traditional for loops, by converting:
for i := 0; i < n; i++
to:
for i := range n
Adding type conversions where needed for compatibility
with existing uint64 parameters.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Allows the forward plugin to execute the next plugin based on the return code. Similar to the externally mainted alternate plugin https://github.com/coredns/alternate
Based on the idea of chrisohaver@ in #6549 (comment)
Also incoperated the request to rename `alternate` to `next` as an option
I am having issues adding a proper test for functionality. Primarily, I do not know the code base enough and having multiple `dnstest.NewServer` with ResponseWriter does not work. From my testing these are "Singletons'' and only the last defined response writer is used for all servers
Signed-off-by: Jasper Bernhardt <jasper.bernhardt@live.de>
* plugin/forward: Move Proxy into pkg/plugin/proxy, to allow forward.Proxy to be used outside of forward plugin.
Signed-off-by: Patrick Downey <patrick.downey@dioadconsulting.com>
* plugin/forward: convert the specified domain of health_check to Fqdn
* plugin/forward: update readme for health check
Signed-off-by: vanceli <vanceli@tencent.com>
* Add forwardcrd plugin README.md
Co-authored-by: Aidan Obley <aobley@vmware.com>
Signed-off-by: Christian Ang <angc@vmware.com>
* Create forwardcrd plugin
- Place forwardcrd before forward plugin in plugin list. This will avoid
forward from preventing the forwardcrd plugin from handling any queries
in the case of having a default upstream forwarder in a server block (as
is the case in the default kubernetes Corefile).
Co-authored-by: Aidan Obley <aobley@vmware.com>
Signed-off-by: Christian Ang <angc@vmware.com>
* Add Forward CRD
Signed-off-by: Christian Ang <angc@vmware.com>
* Add NewWithConfig to forward plugin
- allows external packages to instanciate forward plugins
Co-authored-by: Aidan Obley <aobley@vmware.com>
Signed-off-by: Christian Ang <angc@vmware.com>
* ForwardCRD plugin handles requests for Forward CRs
- add a Kubernetes controller that can read Forward CRs
- instances of the forward plugin are created based on Forward CRs from
the Kubernetes controller
- DNS requests are handled by calling matching Forward plugin instances
based on zone name
- Defaults to the kube-system namespace to align with Corefile RBAC
Signed-off-by: Christian Ang <angc@vmware.com>
Use klog v2 in forwardcrd plugin
* Refactor forward setup to use NewWithConfig
Co-authored-by: Christian Ang <angc@vmware.com>
Signed-off-by: Edwin Xie <exie@vmware.com>
* Use ParseInt instead of Atoi
- to ensure that the bitsize is 32 for later casting to uint32
Signed-off-by: Christian Ang <angc@vmware.com>
* Add @christianang to CODEOWNERS for forwardcrd
Signed-off-by: Christian Ang <angc@vmware.com>
Co-authored-by: Edwin Xie <exie@vmware.com>