We experienced a DHCP server failure at our site. During recovery with only one server in service, it was operating in communications-interupted mode. During this time we observed that it was handing out very short leases of apparently random lengths - for example 30 minutes, 15 minutes, 9 minutes etc. Once the two servers were brought back into communication, it reverted to providing leases of the usual configured length.
What is the explanation for the random times on the short leases during the communications-interrupted period of operation? Shouldn't they all be given out as the configured Maximum Client Lead Time (MCLT)?
This is the expected operation - here follows an explanation that tries to avoid delving too deeply into the algorithms that are used to calculate lease times given out by a failover pair.
The Maximum Client Lead Time is the longest time beyond the lease end time that is known by both partners of a failover pair that a lease can be assigned.
Whilst the servers are in communication, as they are assigning leases, they're sending updates on their lease end times back and forth - but they have also (because handing out leases is a time-critical activity) to respond to the clients first - before they can complete the dialogue with their partner.
This is why, initially, a client newly booting is assigned a very short lease time (using the MCLT), but having done this, the server assigning the lease notifies its partner with a timestamp that is sufficiently far in the future for it to be able to provide a lease that is of the proper length when the client renews halfway though the initial lease period. Assuming the partner acknowledges this new further-away time,
then all will work as expected.
Now - think about what happens when a server is down, and/or when a failover pair can't communicate with each other.
The server running alone will be running with the state that it last had, including the lease end times that its partner said it 'knew'. With this 'state' information about what the other server 'knows', plus the configured MCLT and still being out of communication with its peer, the calculated new lease times would be getting shorter and heading towards MCLT - but wouldn't necessarily be that small immediately on a lease renewal because of the existing and partner-acknowledged lease end time.
The variance on the times is going to depend on how much time there was left on the current leases when the the clients renew. Further complications such as network outages could mean that some clients would have been trying to renew their leases for longer than others.
A large amount of technical detail is omitted here - including the fact that it's not just a single timestamp controlling what happens in the communication between failover peers - the intention is simply to explain the underlying principle being followed.
For more information, please read the failover protocol documentation:
(Section 5,2,1 onwards is a good starting point for those already part-familiar)
© 2001-2017 Internet Systems ConsortiumFor assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.