Unbound - Long query times (seconds)

dramm · 2023-05-03 00:33:03

I've setup pihole and unbound with root hints.
Pihole is working perfectly but unbound takes a lot of time to return the IP if the domain is not cached yet.

I understand that query time can be a bit high when building cache but 4 seconds feels a bit too much.
I tried some settings in unbound according to its documentation regarding performance, but it didn't change much.

Anyone with thoughts on what might be going on?

~ dig google.com @127.0.0.1 -p 5335
; <<>> DiG 9.18.14 <<>> google.com @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52483
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             298     IN      A       172.217.162.110

;; Query time: 4126 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Tue May 02 21:05:06 -03 2023
;; MSG SIZE  rcvd: 55

~ dig facebook.com @127.0.0.1 -p 5335

; <<>> DiG 9.18.14 <<>> facebook.com @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8256
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;facebook.com.                  IN      A

;; ANSWER SECTION:
facebook.com.           300     IN      A       31.13.85.36

;; Query time: 4016 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Tue May 02 21:18:32 -03 2023
;; MSG SIZE  rcvd: 57

~ less /etc/unbound/unbound.conf
server:
    # If no logfile is specified, syslog is used
    # logfile: "/var/log/unbound/unbound.log"
    #verbosity: 1

    interface: 127.0.0.1
    port: 5335
    do-ip4: yes
    do-udp: yes
    do-tcp: yes

    # May be set to yes if you have IPv6 connectivity
    do-ip6: yes

    # You want to leave this to no unless you have *native* IPv6. With 6to4 and
    # Terredo tunnels your web browser should favor IPv4 for the same reasons
    prefer-ip6: yes

    # Use this only when you downloaded the list of primary root servers!
    # If you use the default dns-root-data package, unbound will find it automatically
    root-hints: "/etc/unbound/root.hints"

    # Trust glue only if it is within the server's authority
    harden-glue: yes

    # Require DNSSEC data for trust-anchored zones, if such data is absent, the zone becomes BOGUS
    harden-dnssec-stripped: yes
    trust-anchor-file: /etc/unbound/trusted-key.key
    #trust-anchor-file: /etc/unbound/trusted-key.key

    # Don't use Capitalization randomization as it known to cause DNSSEC issues sometimes
    # see https://discourse.pi-hole.net/t/unbound-stubby-or-dnscrypt-proxy/9378 for further details
    use-caps-for-id: no

    # Reduce EDNS reassembly buffer size.
    # IP fragmentation is unreliable on the Internet today, and can cause
    # transmission failures when large DNS messages are sent via UDP. Even
    # when fragmentation does work, it may not be secure; it is theoretically
    # possible to spoof parts of a fragmented DNS message, without easy
    # detection at the receiving end. Recently, there was an excellent study
    # >>> Defragmenting DNS - Determining the optimal maximum UDP response size for DNS <<<
    # by Axel Koolhaas, and Tjeerd Slokker (https://indico.dns-oarc.net/event/36/contributions/776/)
    # in collaboration with NLnet Labs explored DNS using real world data from the
    # the RIPE Atlas probes and the researchers suggested different values for
    # IPv4 and IPv6 and in different scenarios. They advise that servers should
    # be configured to limit DNS messages sent over UDP to a size that will not
    # trigger fragmentation on typical network links. DNS servers can switch
    # from UDP to TCP when a DNS response is too big to fit in this limited
    # buffer size. This value has also been suggested in DNS Flag Day 2020.
    edns-buffer-size: 1232

    # Perform prefetching of close to expired message cache entries
    # This only applies to domains that have been frequently queried
    prefetch: yes
    prefetch-key: yes

    # One thread should be sufficient, can be increased on beefy machines. In reality for most users running on small networks or on a single machine, it should be unnecessary to seek
 performance enhancement by increasing num-threads above 1.
    num-threads: 4

    # power of 2 close to num-threads
    msg-cache-slabs: 4
    rrset-cache-slabs: 4
    infra-cache-slabs: 4
    key-cache-slabs: 4

    # more outgoing connections
    # depends on number of cores: 1024/cores - 50
    outgoing-range: 200
    num-queries-per-thread: 100

    # Ensure kernel buffer is large enough to not lose messages in traffic spikes
    so-rcvbuf: 2m
    so-sndbuf: 2m

    # Reuse ports to improve UDP performance
    so-reuseport: yes

    # Ensure privacy of local IP ranges
    private-address: 192.168.0.0/16
    private-address: 169.254.0.0/16
    private-address: 172.16.0.0/12
    private-address: 10.0.0.0/8
    private-address: fd00::/8
    private-address: fe80::/10

    # Cache settings
    msg-cache-size: 64m
    rrset-cache-size: 128m
    neg-cache-size: 32m
    serve-expired: yes

remote-control:

    # Enable remote control with unbound-control(8) here.

    # set up the keys and certificates with unbound-control-setup.

    control-enable: yes

-thc · 2023-05-03 08:09:58

Just a hunch: "Un-prefer" and disable IPv6 in unbound and try again.

dramm · 2023-05-10 03:38:26

It might have improved, but I'm having a hard time getting consistent results.
It reduced to about 500ms, however I re-enabled IPv6 to confirm and it still was around the 600ms mark.

Sometimes I still get about 2000ms even with IPv6 off. Maybe is the way I'm testing?
I'm restarting the unbound service and immediately using dig or drill to prevent getting a cached result.
Since I'm using the port for unbound and no pihole I'm pretty sure it doesn't matter that pihole is still working.

It could be that after the restart there is a queue of request and that's why it takes longer?

Also, some times I get a single or a couple of:

communications error to 127.0.0.1#5335: connection refused

Followed either by either the expected response from dig/drill or the command returns an error and next time I get the expected response.
In both cases the timings are still a bit high (+1000ms)

I'm not sure what is the best way to test this... The results are too inconsistent.

-thc · 2023-05-10 05:03:27

Maybe your expectations are too high.

Are you aware of the amount of queries a caching nameserver has to do when - for example - www.ibm.com must be resolved?
In some cases - when CDNs are involved - more than 20.

When I query uncached domains I get results between 200 and 500 ms and I consider this perfectly normal.

That said - higher values are not acceptable and maybe it's time to turn on detailed logging in unbound and have look there.

Last edited by -thc (2023-05-10 06:02:10)

Arch Linux

#1 2023-05-03 00:33:03

Unbound - Long query times (seconds)

#2 2023-05-03 08:09:58

Re: Unbound - Long query times (seconds)

#3 2023-05-10 03:38:26

Re: Unbound - Long query times (seconds)

#4 2023-05-10 05:03:27

Re: Unbound - Long query times (seconds)

Board footer