* perf(proxy): use mutex-based connection pool
The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).
Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
connection slice, allowing for fast concurrent access without
context switching.
- Downgraded connManager to a simple background cleanup loop that
only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
instead of channel sends.
- Updated tests to reflect the removal of internal channels.
Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.
The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address PR review for persistent.go
- Named mutex field instead of embedding, to not expose
Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address code review feedback for conn pool
- Switch from LIFO to FIFO connection selection for source port
diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
while holding lock, close synchronously after releasing lock.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: remove unused error consts
No longer utilised after refactoring the channel based approach.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* feat(forward): add max_idle_conns option
Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.
Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option
By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.
Also add a yield related test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore(proxy): simple Dial by closing conns inline
Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore: fewer explicit Unlock calls
Cleaner and less chance of forgetting to unlock on new possible
code paths.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable "gosec" linter.
Exclude:
- All G115 (integer overflow) findings, to be fixed separately.
Add targeted gosec annotations for:
- non-crypto math/rand usage
- md5 used only for file change detection
- G114 ("net/http serve with no timeout settings"), to be fixed
separately.
Other findings fixed.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
The test "TestDial_TransportStoppedDuringRetWait" replaced
tr.dial and tr.ret with test-controlled channels, then called
tr.Start(). Since connManager reads from t.dial, both the test
and connManager were racing to read from the same channel.
Remove tr.Start() since the test manually simulates connManager
behavior.
Also changed some test log formatting to align with other tests.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Add RWMutex to protect concurrent map access in Set, Unset, and ForEach methods.
Change New() to return *U pointer type for proper synchronization.
Signed-off-by: Cangming H <cangmingh@gmail.com>
* perf: avoid string concatenation in loops
Apply perfpsrint linter
Signed-off-by: Philippe Antoine <contact@catenacyber.fr>
* ci: enable perfsprint
Signed-off-by: Philippe Antoine <contact@catenacyber.fr>
---------
Signed-off-by: Philippe Antoine <contact@catenacyber.fr>
Enable intrange linter to enforce modern Go range syntax over
traditional for loops, by converting:
for i := 0; i < n; i++
to:
for i := range n
Adding type conversions where needed for compatibility
with existing uint64 parameters.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable canonicalheader linter to enforce proper HTTP header casing.
This ensures headers use Go's canonical format (e.g., "Content-Type"
instead of "content-type") for consistency.
Fixes header casing in DoH implementation.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Ensure Dial exits early or returns error when Transport has been
stopped, instead of blocking on the dial or ret channels. This removes
a potential goroutine leak where callers could pile up waiting
forever under heavy load.
Add select guards before send and receive, and propagate clear error
values so callers can handle shutdown gracefully.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Add test suite covering thread-safe random number generator with
tests for:
- Constructor with various seed values (positive, zero, negative)
- Deterministic behavior verification with same seeds
- Permutation generation and validation
- Concurrent access safety with multiple goroutines
- Mixed operations under concurrent load
Also clarify package documentation to explicitly state this is
for load balancing and server selection, not cryptographic use.
The math/rand usage is intentional for performance in non-security
contexts like upstream server selection and DNS record shuffling.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable the usetesting linter in golangci.yml configuration to
enforce proper testing practices. Replace manual temporary
directory and file creation with t.TempDir() in test files.
This improves test reliability by ensuring proper cleanup and
follows Go testing best practices.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Add a NativeHistogramBucketFactor parameter to the use of
`NewHistogramVec` in order to enable use of Prometheus Native
Histograms.
This will store automatically computed sparse buckets in CoreDNS.
If a compatible Prometeus requests native histograms this data will
returned instead of the static buckets.
The default factor of 1.05 should provide high quality resolution data.
Signed-off-by: SuperQ <superq@gmail.com>
When packing the empty domain name, miekg/dns can end up creating
corrupt DNS messages. With some planned unpacking changes, this now
trips an error condition and causes these tests to fail. Correct this
by using the root domain explicitly as this gets correctly encoded on
the wire.
Signed-off-by: Tom Thorogood <me+github@tomthorogood.net>
* plugin/forward: Move Proxy into pkg/plugin/proxy, to allow forward.Proxy to be used outside of forward plugin.
Signed-off-by: Patrick Downey <patrick.downey@dioadconsulting.com>
* plugin/edns: remove truncating of question section on bad EDNS version
EDNS requests of "Unknown Version" removed the query section altogether.
Not sure why since this is not require (see [link](https://kb.isc.org/docs/edns-compatibility-dig-queries)
This cause issues with DNS solutions that uses this information (initial queried name, type and class) in order to route the response to the right client (e.g. PDNS).
The change here is to keep the inital query section as is.
Signed-off-by: Ben Kaplan <ben.kaplan@redis.com>
* adding tests for edns0 version check
Signed-off-by: Ben Kaplan <ben.kaplan@redis.com>
* adding tests for non-edns0 version check
Signed-off-by: Ben Kaplan <ben.kaplan@redis.com>
Signed-off-by: Ben Kaplan <ben.kaplan@redis.com>
* introduce new interface "dnsserver.Viewer", that allows a plugin implementing it to decide if a query should be routed into its server block.
* add new plugin "view", that uses the new interface to enable a user to define expression based conditions that must be met for a query to be routed to its server block.
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>