Knowledge Base ISC Main Website Ask a Question/Contact ISC
What causes "refresh: failure trying master ...: operation canceled" error messages?
Author: Brian Conry Reference Number: AA-01213 Views: 5102 Created: 2014-10-21 13:46 Last Updated: 2014-10-22 13:26 0 Rating/ Voters

Problem:

Multiple operators have reported to us that on some Linux systems running BIND as a slave, their zones can get behind and the following messages are logged:

zone my.example.zone/IN: refresh: failure trying master 10.1.2.3.4#53 (source 0.0.0.0#0): operation canceled

Answer:

For the most part, this message doesn't indicate a serious problem.  BIND will retry the refresh operation, either when it receives another NOTIFY from the master or when the refresh/retry timer triggers, and usually that succeeds and the zones don't get too far behind.

The cases of this that we've seen and been able to troubleshoot lead us to believe that the problem is being caused by one of the Linux netfilter kernel modules.  It seems that one of the netfilter modules sometimes erroneously generates a DROP on the send of the SOA query that is part of the zone refresh.  This causes the sendmsg(2) call to return EPERM, which then results in the above error message.

It is suspected that this is a race condition of some sort.  It seems to only occur on very busy servers and intrusive diagnostics (e.g. strace) prevent the error from occurring.

One known work-around is unloading the kernel netfilter modules, assuming that you aren't using them.

We've received reports of this from RHEL and Debian systems.  We think it probably has more to do with the kernel version than it does the distribution, because in some of the existing reports the kernel version is the only difference between a system experiencing the issue and systems that are not.

If you have encountered this error and wish to submit a report, you can use our online form.

We welcome any new information or insight into this problem, but even if you are unable to provide any new evidence, by submitting a bug report we can add you to the list of those experiencing this issue.


© 2001-2016 Internet Systems Consortium

Please help us to improve the content of our knowledge base by letting us know below how we can improve this article.

If you have a technical question or problem on which you'd like help, please don't submit it here as article feedback.

For assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.

ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.

Feedback
  • There is no feedback for this article
Info Submit Feedback on this Article
Nickname: Your Email: Subject: Comment:
Enter the code below:
Quick Jump Menu