Knowledge Base ISC Main Website Ask a Question/Contact ISC
Recommendations for restarting a DHCP failover pair
Author: Cathy Almond Reference Number: AA-01043 Views: 4918 Created: 2013-09-03 15:03 Last Updated: 2015-03-18 11:19 0 Rating/ Voters
Question:

It's sometimes necessary to restart an ISC DHCP server.  If you have two servers running as a failover pair, then there shouldn't be any significant interruption to client service during the restart - but when you need to restart both servers, what is the recommended process?

Answer:

Individual deployments vary, but here follows a generic process that you can tailor to your specific environment's needs.

  • When restarting a failover pair, restart one first, and then the other - not both at the same time.

  • While one is being restarted, the other will go into 'communications interrupted' state (and will change its behaviour on granting leases accordingly) but then should recover when its partner comes back online.

  • The servers will log all of these transitions and the server status can also be confirmed via OMAPI.

  • When restarting the pair, wait for the restart of the first one to complete fully before restarting the second one. That allows them time to reestablish communications and do pool balancing before you take the second one offline for its restart.

Which server should be restarted first?

If you've made significant changes such as extending or removing failover ranges, then on restarting one server, there will be a mismatch and some errors logged until both have been restarted.  Restarting the secondary first is held to be better in this situation - but probably won't make that much of a difference to the overall start-up times.

What you are aiming at, is that the non-partnership parts of the configuration (i.e. the parts describing the addresses and pools and such) will be identical between the two peers once they have both restarted, although there may be a period during the transition where they are not.

For making the transition, the process flow would be similar to:

  • Modify the secondary's configuration file
  • Stop and restart the secondary using the new configuration
  • Modify the primary's configuration file
  • Stop and restart the primary
How can I tell that the first server's restart has completed fully?

Two strategies exist for script-managing this (apart from manually observing the servers as they are logging):

  1. Inspect the log entries as the server is restarting:
    The logged entries as the DHCP server completes the start process should look similar those below:
    dhcpd: Wrote 0 deleted host decls to leases file.
    dhcpd: Wrote 0 new dynamic host decls to leases file.
    dhcpd: Wrote 72468 leases to leases file.
    dhcpd: Listening on LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
    dhcpd: Sending on LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
    dhcpd: Sending on Socket/fallback/fallback-net

    Then on the server that has just been restarted, you should next be seeing communications reestablished with each failover partner (if there are multiple partners, you will need to check for all sets of failover peer communications status messages):
    failover peer foo: I move from normal to startup
    failover peer foo: peer moves from normal to communications-interrupted
    failover peer foo: I move from startup to normal
    balancing pool 28593100 172.16.132.0/24  total 11  free 7  backup 4  lts 1  max-own (+/-)1
    balanced pool 28593100 172.16.132.0/24  total 11  free 7  backup 4  lts 1  max-misbal 2
    failover peer foo: peer moves from communications-interrupted to normal

    From ISC DHCP version 4.3, dhcpd will log an explicit message to indicate that it has completed its start process.  (This will be documented in the ISC DHCP release notes with reference RT #33208)

    In the simple case where there are two DHCP servers who only partner with each other, then you could alternatively monitor the state of the server that is being rebooted from the server that is waiting for its partner to be fully operational again before it itself is restarted.  In the case of the server that has not been restarted, the failover peer communications status messages should look similar to these:

    peer foo: disconnected
    failover peer foo: I move from normal to communications-interrupted
    failover peer foo: peer moves from normal to normal
    failover peer foo: I move from communications-interrupted to normal
    balancing pool 285ff100 172.16.132.0/24  total 11  free 7  backup 4  lts -1  max-own (+/-)1
    balanced pool 285ff100 172.16.132.0/24  total 11  free 7  backup 4  lts -1  max-misbal 2
    peer foo: disconnected


  2. Use OMAPI to test the server state - and only restart the second partner when the first one has restarted and completed syncing with and establishing normal communication with its partner.
What is the recommended way to cleanly stop a DHCP server?

'kill' is the recommended option, except where there is a high turnover of leases and the production environment requires a high degree of reliability from DHCP. In that case, we'd suggest that administrators consider using OMAPI to control the daemon instead and to request a graceful shutdown.

The reason for this is that there is the slight possibility that by using kill, administrators may stop dhcpd in the middle of appending a lease to the leases file (in which case it may become corrupted).  This risk, while tiny, may be significant enough for some administrators to prefer to use OMAPI instead.

How should a corrupted lease file be mended?

The workaround in this situation will be to manually edit the lease file to remove the truncated lease.

Why is "kill "preferred to OMAPI in these recommendations?

The risks of using kill to stop dhcpd are minimal and acceptable in most environments whose administrators, unless they are already using OMAPI, would find it cumbersome to set up solely for the purpose of controlling the dhcpd shutdown.  However, if you already have OMAPI set up, then there is no disadvantage to using it to shut down the server. 

Why is there no method to signal dhcpd to shutdown gracefully, outside of OMAPI?

Although the two options for stopping DHCP documented above have worked well over the years, it's our intent to make it possible to signal to DHCP to terminate gracefully by sending SIGINT (sent when pressing ctrl-c) or SIGTERM (default signal sent by kill).  This signal-based shutdown can be used for all server, relay and client.  (When released, this will be documented in the ISC DHCP release notes with reference RT #32692)


© 2001-2016 Internet Systems Consortium

Please help us to improve the content of our knowledge base by letting us know below how we can improve this article.

If you have a technical question or problem on which you'd like help, please don't submit it here as article feedback.

For assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.

ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.

Feedback
  • There is no feedback for this article
Info Submit Feedback on this Article
Nickname: Your Email: Subject: Comment:
Enter the code below:
Quick Jump Menu