* perf(proxy): use mutex-based connection pool
The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).
Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
connection slice, allowing for fast concurrent access without
context switching.
- Downgraded connManager to a simple background cleanup loop that
only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
instead of channel sends.
- Updated tests to reflect the removal of internal channels.
Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.
The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address PR review for persistent.go
- Named mutex field instead of embedding, to not expose
Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address code review feedback for conn pool
- Switch from LIFO to FIFO connection selection for source port
diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
while holding lock, close synchronously after releasing lock.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: remove unused error consts
No longer utilised after refactoring the channel based approach.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* feat(forward): add max_idle_conns option
Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.
Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option
By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.
Also add a yield related test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore(proxy): simple Dial by closing conns inline
Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore: fewer explicit Unlock calls
Cleaner and less chance of forgetting to unlock on new possible
code paths.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Replace http.Serve() with http.Server{} configured with timeouts to
address G114 gosec findings (HTTP server without timeouts). This
prevents potential slowloris attacks and resource exhaustion.
Changes:
- Add ReadTimeout, WriteTimeout, IdleTimeout (5s each) to HTTP servers
- Use srv.Shutdown(ctx) for graceful shutdown instead of ln.Close()
- Follow existing pattern from plugin/metrics
Fixes part of #7793
Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
Remove expensive runtime.Caller calls from metrics Recorder.WriteMsg
by tracking the responding plugin through the plugin chain instead.
- Add PluginTracker interface and pluginWriter wrapper in plugin.go
- Modify NextOrFailure to wrap ResponseWriter with plugin name
- Update metrics Recorder to implement PluginTracker
- Remove authoritativePlugin method using filepath inspection
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable "gosec" linter.
Exclude:
- All G115 (integer overflow) findings, to be fixed separately.
Add targeted gosec annotations for:
- non-crypto math/rand usage
- md5 used only for file change detection
- G114 ("net/http serve with no timeout settings"), to be fixed
separately.
Other findings fixed.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* Improve SOA error handling/reporting.
Signed-off-by: Ross Golder <ross@golder.org>
* Add tests for malformed SOA records
Signed-off-by: Ross Golder <ross@golder.org>
* Address review comments: assert exact parse errors in SOA tests and fix gofmt
Signed-off-by: Ross Golder <ross@golder.org>
---------
Signed-off-by: Ross Golder <ross@golder.org>
Add configurable resource limits to prevent potential DoS vectors
via connection/stream exhaustion on gRPC, HTTPS, and HTTPS/3 servers.
New configuration plugins:
- grpc_server: configure max_streams, max_connections
- https: configure max_connections
- https3: configure max_streams
Changes:
- Use netutil.LimitListener for connection limiting
- Use gRPC MaxConcurrentStreams and message size limits
- Add QUIC MaxIncomingStreams for HTTPS/3 stream limiting
- Set secure defaults: 256 max streams, 200 max connections
- Setting any limit to 0 means unbounded/fallback to previous impl
Defaults are applied automatically when plugins are omitted from
config.
Includes tests and integration tests.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Replace "reload 2s" with "quic" in quicReloadCorefile to avoid
spawning a background goroutine that reads dnsserver.Port while
TestReadme modifies it. The test TestQUICReloadDoesNotPanic still
verifies the QUIC reload panic fix via explicit inst.Restart() call.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
The test "TestDial_TransportStoppedDuringRetWait" replaced
tr.dial and tr.ret with test-controlled channels, then called
tr.Start(). Since connManager reads from t.dial, both the test
and connManager were racing to read from the same channel.
Remove tr.Start() since the test manually simulates connManager
behavior.
Also changed some test log formatting to align with other tests.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>