The PowerDNS Recursor collects many statistics about itself.
Every half hour or so (configurable with statistics-interval, the recursor outputs a line with statistics. To force the output of statistics, send the process a SIGUSR1. A line of statistics looks like this:
Feb 10 14:16:03 stats: 125784 questions, 13971 cache entries, 309 negative entries, 84% cache hits, outpacket/query ratio 37%, 12% throttled
This means that there are 13791 different names cached, which each may have multiple records attached to them. There are 309 items in the negative cache, items of which it is known that don’t exist and won’t do so for the near future. 84% of incoming questions could be answered without any additional queries going out to the net.
The outpacket/query ratio means that on average, 0.37 packets were needed to answer a question. Initially this ratio may be well over 100% as additional queries may be needed to actually recurse the DNS and figure out the addresses of nameservers.
Finally, 12% of queries were not performed because identical queries had gone out previously, saving load on servers worldwide.
For carbon/graphite/metronome, we use the following namespace. Everything starts with ‘pdns.’, which is then followed by the local hostname. Thirdly, we add ‘recursor’ to signify the daemon generating the metrics. This is then rounded off with the actual name of the metric. As an example: ‘pdns.ns1.recursor.questions’.
Care has been taken to make the sending of statistics as unobtrusive as possible, the daemons will not be hindered by an unreachable carbon server, timeouts or connection refused situations.
To benefit from our carbon/graphite support, either install Graphite, or use our own lightweight statistics daemon, Metronome, currently available on GitHub.
To enable sending metrics, set carbon-server, possibly carbon-interval and possibly carbon-ourname in the configuration.
Warning
If your hostname includes dots, they will be replaced by underscores so as not to confuse the namespace.
If you include dots in carbon-ourname, they will not be replaced by underscores. As PowerDNS assumes you know what you are doing if you override your hostname.
New in version 4.1.0.
The recursor can export statistics over SNMP and send traps from Lua, provided support is compiled into the Recursor and snmp-agent set.
Should Carbon not be the preferred way of receiving metric, several other techniques can be employed to retrieve metrics.
The API exposes a statistics endpoint at GET /api/v1/servers/:server_id/statistics.
This endpoint exports all statistics in a single JSON document.
rec_control¶Metrics can also be gathered on the system itself by invoking rec_control:
rec_control get-all
Single statistics can also be retrieved with the get command, e.g.:
rec_control get all-outqueries
External programs can use this technique to scrape metrics.
These statistics are gathered.
It should be noted that answers0-1 + answers1-10 + answers10-100 + answers100-1000 + answers-slow + packetcache-hits + over-capacity-drops + policy-drops = questions.
Also note that unauthorized-tcp and unauthorized-udp packets do not end up in the ‘questions’ count.
counts the number of outgoing UDP queries since starting
counts the number of queries answered after 1 second
counts the number of queries answered within 1 millisecond
counts the number of queries answered within 10 milliseconds
counts the number of queries answered within 100 milliseconds
counts the number of queries answered within 1 second
counts the number of queries answered by auth4s after 1 second (4.0)
counts the number of queries answered by auth4s within 1 millisecond (4.0)
counts the number of queries answered by auth4s within 10 milliseconds (4.0)
counts the number of queries answered by auth4s within 100 milliseconds (4.0)
counts the number of queries answered by auth4s within 1 second (4.0)
counts the number of queries answered by auth6s after 1 second (4.0)
counts the number of queries answered by auth6s within 1 millisecond (4.0)
counts the number of queries answered by auth6s within 10 milliseconds (4.0)
counts the number of queries answered by auth6s within 100 milliseconds (4.0)
counts the number of queries answered by auth6s within 1 second (4.0)
counts the number of queries to locally hosted authoritative zones (auth-zones) since starting
size of the cache in bytes
shows the number of entries in the cache
counts the number of cache hits since starting, this does not include hits that got answered from the packet-cache
counts the number of cache misses since starting
counts the number of mismatches in character case since starting
number of queries chained to existing outstanding query
counts number of client packets that could not be parsed
shows the number of MThreads currently running
number of records dropped because of delegation-only setting
number of queries received with the DO bit set
number of DNSSEC validations that had the Bogus state
number of DNSSEC validations that had the Indeterminate state
number of DNSSEC validations that had the Insecure state
number of DNSSEC validations that had the NTA (negative trust anchor) state
number of DNSSEC validations that had the Secure state
number of DNSSEC validations performed
number of outgoing queries dropped because of dont-query setting (since 3.3)
number of outgoing queries adorned with an EDNS Client Subnet option (since 4.1)
number of responses received from authoritative servers with an EDNS Client Subnet option we used (since 4.1)
number of servers that sent a valid EDNS PING response
number of servers that sent an invalid EDNS PING response
number of servers that failed to resolve
counts the number of non-query packets received on server sockets that should only get query packets
number of outgoing queries over IPv6
counts all end-user initiated queries with the RD bit set, received over IPv6 UDP
returns the number of bytes allocated by the process (broken, always returns 0)
currently configured maximum number of cache entries
currently configured maximum number of packet cache entries
maximum amount of thread stack ever used
shows the number of entries in the negative answer cache
number of erroneous received packets
number of queries sent out without EDNS
counts the number of times it answered NOERROR since starting
number of queries sent out without ENDS PING
number of times an nsset was dropped because it no longer worked
shows the number of entries in the NS speeds map
counts the number of times it answered NXDOMAIN since starting
counts the number of timeouts on outgoing UDP queries since starting
counts the number of timeouts on outgoing UDP IPv4 queries since starting (since 4.0)
counts the number of timeouts on outgoing UDP IPv6 queries since starting (since 4.0)
questions dropped because over maximum concurrent query limit (since 3.2)
size of the packet cache in bytes (since 3.3.1)
size of packet cache (since 3.2)
packet cache hits (since 3.2)
packet cache misses (since 3.2)
packets dropped because of (Lua) policy decision
packets that were not actioned upon by the RPZ/filter engine
packets that were dropped by the RPZ/filter engine
packets that were replied to with NXDOMAIN by the RPZ/filter engine
packets that were replied to with no data by the RPZ/filter engine
packets that were forced to TCP by the RPZ/filter engine
packets that were sent a custom answer by the RPZ/filter engine
shows the current latency average, in microseconds, exponentially weighted over past ‘latency-statistic-size’ packets
counts all end-user initiated queries with the RD bit set
counts number of queries that could not be performed because of resource limits
security status based on Security Polling
counts number of server replied packets that could not be parsed
counts the number of times it answered SERVFAIL since starting
number of times PowerDNS considered itself spoofed, and dropped the data
number of CPU milliseconds spent in ‘system’ mode
number of times an IP address was denied TCP access because it already had too many connections
counts the number of currently active TCP/IP clients
counts the number of outgoing TCP queries since starting
counts all incoming TCP queries (since starting)
shows the number of entries in the throttle map
counts the number of throttled outgoing UDP queries since starting
idem to throttled-out
questions dropped that were too old
number of TCP questions denied because of allow-from restrictions
number of UDP questions denied because of allow-from restrictions
number of answers from remote servers that were unexpected (might point to spoofing)
number of times nameservers were unreachable since starting
number of seconds process has been running (since 3.1.5)
number of CPU milliseconds spent in ‘user’ mode
New in version 4.1: Not yet proven to be reliable
PowerDNS measures per query how much time has been spent waiting on authoritative servers. In addition, the Recursor measures the total amount of time needed to answer a question. The difference between these two durations is a measure of how much time was spent within PowerDNS. This metric is the average of that difference, in microseconds.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 0 and 1 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 1 and 2 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 2 and 4 milliseconds was spent within the Recursor. Since 4.1. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 4 and 8 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 8 and 16 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where between 16 and 32 milliseconds was spent within the Recursor. See x-our-latency for further details.
New in version 4.1: Not yet proven to be reliable
Counts responses where more than 32 milliseconds was spent within the Recursor. See x-our-latency for further details.