My busy Linux-based nameserver is giving unreasonably slow responses. How do I know if Linux connection tracking is causing the problem I am having?
If you are seeing slow responses and timeouts from your nameserver, check its kernel log output ("dmesg" is one way to do this.) You might find hundreds of
entries similar to this:
Conntrack table full; dropping packet
If so, this article is definitely for you, read on. If not, and you are running BIND named on GNU/Linux, it won't hurt to read on anyway. (Lack of the problem could mean that your nameserver is just not busy enough ... yet.)
Linux Netfilter connection tracking is a very powerful resource for firewall engineers and system administrators. But on (or in front of) a nameserver, there is generally no point in tracking UDP DNS queries. Also, Linux kernel defaults for the size of the connection tracking table are unreasonably low for a busy router or nameserver.
"But UDP is connectionless, how can you track connections?" Yes, that is true. For you and me to communicate via UDP, you throw a packet at me, and I throw one back at you. The protocol has no means to establish a "connection" nor to verify for either of us that the other received the sent packet[s].
Netfilter connection tracking, however, is protocol-agnostic. A "connection" is simply an identified source[:port]/destination[:port]/protocol where packets are going (or have gone) in both directions.
Protocols which use UDP transport sometimes provide a means in the higher-level protocol to track communication. In the case of DNS, a client (resolver) sends an ID number in each query, so the software can use that (in addition to the source/destination IP addresses and ports) to match queries with the answers received.
A typical UDP "connection" for DNS
is exactly two packets: a query comes in, an answer is returned. (From the resolver/client's view it's reversed: a query going out, and an answer coming back.) As we have seen, a busy named server can have lots of these entries in its conntrack table. Each entry requires kernel-space memory of course, and each entry counts against the total number of entries that the table can accommodate. And each entry remains in the conntrack table until it times out, minutes later, an unreasonably long period for DNS
An authoritative nameserver is generally going to accept all packets on 53/UDP from anywhere. A recursive resolver is going to accept all packets on 53/UDP from its own networks. Firewall query rate limiting is possible in each case, but ISC does not recommend it.
Therefore you might as well disable connection tracking for your 53/UDP DNS
queries and replies. Fortunately this is very easy to do, and it should be supported on all recent mainstream GNU/Linux distributions. (It might not be possible on custom kernels, if the Netfilter modules are not available. In that case the answer is to fix your kernel.)
The Netfilter raw table and the NOTRACK target was introduced some time during the heyday of the Linux 2.4 kernels. Later on it was superseded by the CT target with the "--notrack" option. If your Linux kernel is 2.6 or later, you should have CT and --notrack. (If not, have you considered upgrading? Even 2.6 is getting old now.)
Linux Netfilter iptables consists of several independent "tables" which then have predefined "chains". iptables(8) is the userspace binary which manipulates rules in the kernel. Contrary to what you might think, there is no daemon process running; "to start iptables" or "to stop iptables" is an inaccurate way of saying, "to change the kernel's Netfilter rules."
In this article we are mainly concerned with the "raw" table, but we will also touch on the "filter" table. Then we will briefly mention the "nat" table.
The raw table is so named because it sees raw network traffic, before any Netfilter rules are applied to it. The main purpose of raw is to disable connection tracking for selected packets. The raw table provides the following built-in chains: PREROUTING (for packets arriving via any network interface) and OUTPUT (for packets generated by local processes).
The filter table is the place for filtering packets, typically the main purpose of a firewall. The filter table has three builtin chains: INPUT (for packets destined to local sockets), FORWARD (for packets being routed through the box), and OUTPUT (for locally-generated packets).
This article cannot go to go into detail on how to set up your firewall's
filtering; we will simply show the few rules you are likely to need for bypassing conntrack for DNS in UDP. Also, it assumes that you need a firewall; perhaps if you are behind an upstream firewall, you can simply disable the one on the nameserver.
Finally, let's talk about the nat table. This is for network address translation (NAT), and if you are doing NAT on your DNS packets, you are not going to be able to use the following sample rules. NAT depends on connection tracking. If this is the case for you, skip down to the bottom, "What do I do if I must have connection tracking?"
These are in iptables-save(8)/iptables-restore(8) format. This can be converted easily into iptables(8) commands, simply by preceding each rule with "iptables" and the "-t table" argument if "table" is other than filter.
Here is the raw table in its entirety, including the comments added by iptables-save(8):
# Generated by iptables-save v1.4.20 on Fri May 16 12:42:55 2014
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -p udp -m udp --dport 53 -j CT --notrack
-A PREROUTING -p udp -m udp --sport 53 -j CT --notrack
-A OUTPUT -p udp -m udp --dport 53 -j CT --notrack
-A OUTPUT -p udp -m udp --sport 53 -j CT --notrack
# Completed on Fri May 16 12:42:55 2014
The two rules in each of PREROUTING and OUTPUT match UDP packets with destination port ("--dport") and source port ("--sport") 53 (respectively) and tell the kernel not to track their connections.
Note: older kernels might not have the "CT --notrack" target, but the now deprecated "NOTRACK" target is functionally the same.
Next, the filter table. Again, we can't cover the entire filter table here. Many sample rulesets you can find will have a "RELATED,ESTABLISHED" rule like one of these:
# OLD filter rules; each pair is functionally equivalent
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
Typically these rules should be at or near the beginning of each chain's rules. Note that you should have one or the other, not both, for each of INPUT and FORWARD.
We need to switch any "-m state --state" rule to "-m conntrack --ctstate", and to add ",UNTRACKED" to the "--ctstate" list: UNTRACKED is a virtual packet state which is not available in the older and less complete "state" match extension. Also note that the order of the arguments in the "--ctstate" list is not significant. "ESTABLISHED,UNTRACKED,RELATED" will work just as well. Your new rules should look something like this:
# NEW filter rules
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED,UNTRACKED -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED,UNTRACKED -j ACCEPT
A brief word about the filter table's OUTPUT chain: if you are filtering traffic on OUTPUT, I trust that you know what you are doing. If you don't know what you are doing, please consider not trying to filter OUTPUT. You can break a lot of things which will be very difficult to fix, and you are unlikely to be addressing any real security concern.
For this reason a sample OUTPUT rule is not shown, but you might need it if there are any blocking rules in OUTPUT. The only difference is the name of the chain after the "-A".
How and where a GNU/Linux distribution stores firewall rules to restore on reboot or reload varies widely. Some might provide a means to save rules (such as Red Hat's "service iptables save" command, for example), and others might require you to edit (or to redirect iptables-save(8) output to) a file. This article cannot attempt to document all these varied procedures.
Don't worry. It's not that bad. The default values for the conntrack table are very conservative of memory. Most modern systems which can handle the modern needs of DNS will have plenty of RAM at their disposal. You only have to increase the size of the table.
This is on an aging 3.2.13 system with 4GB of physical RAM:
chuck@chestnut:~$ cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
This is on my work laptop, kernel 3.12.7 and 16GB RAM:
cba@tp:~$ cat /proc/sys/net/nf_conntrack_max
And this is from an ancient Slackware 10.0 machine, 2.4.26 kernel and 1GB RAM:
cba@sorry:~$ cat /proc/sys/net/ipv4/ip_conntrack_max
As you can see, over the years this sysctl(8) setting has changed a few times. It's no problem to increase it. Here are samples for each of the above:
# 128 MB
net.ipv4.netfilter.ip_conntrack_max = 134217728
# 512 MB
net.nf_conntrack_max = 536870912
# 512 KB
net.ipv4.ip_conntrack_max = 524288
Then run "sysctl -p" as root to apply these settings.
The sample rules shown above are identical, but they would have to be loaded by a different set of commands: ip6tables(8) for individual rule changes and ip6tables-restore(8) to load an entire ruleset at once.
IPv4 and IPv6 rules are maintained and manipulated separately in the Linux kernel. Rules which are entered for one IP version do not affect the other.
The sysctl(8) settings in the above section are the same, but replace all instances of "ipv4" with "ipv6".
To date we are not aware of any Linux-based BIND nameservers which have had this problem associated with TCP DNS queries. Note also that a TCP DNS query involves more than just two packets; there is the overhead of setting up (and later tearing down) the TCP connection. There could also be more than one packet in the response to the query.
Therefore we see no need to disable connection tracking for DNS in TCP. In general connection tracking is a good thing. It's only DNS in UDP where it can get out of hand, keeping too many old, stale connections in the tracking table.
The best reference for Linux Netfilter and iptables are the manuals which are provided with the software, most notably: iptables(8)/ip6tables(8) and iptables-extensions(8). (The latter might not exist in older Netfilter releases, but all the match and target extensions were then documented in the main iptables(8) manual. The extensions are mostly the same for IP versions 4 and 6, so there is no separate manual for IPv6.) See also iptables-save(8)/ip6tables-save(8) and iptables-restore(8)/ip6tables-restore(8). Ideally one should refer to the local copies of these manuals rather than online copies, because there are always slight version differences which can cause confusion.
http://en.wikipedia.org/wiki/Netfilter The wikipedia page gives a good overview of Netfilter.
http://www.netfilter.org/ The Netfilter project's own site.
http://inai.de/links/iptables/ Some good original content, including the packet diagram, and external links; information about IRC help for Netfilter.
ISC provides professional support for BIND 9, and our support services can include Linux Netfilter assistance. Please see http://www.isc.org/support/ for more information.
Your distribution of GNU/Linux has its own web site and user community. Distrowatch is a site which probably has links to them all.
© 2001-2017 Internet Systems ConsortiumFor assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.