It can be tempting to consider using high speed but volatile storage for the dhcpd leases files in an attempt to improve performance where there are i/o bottlenecks. Often this is accompanied by a plan to routinely copy the lease file to hard disk, for example via crond. The intent is to have a "reasonably up to date database" to recover from. This would be great if it worked that way - but it doesn't.
Both DHCP and the failover protocol operate using the fundamental presumption that lease information has been synched to disk before making replies over the protocol wire. Therefore what is in the leases file is regarded as accurate and representative. So what can go wrong if DHCP is restarted with an innacurate (older) version of the leases file?
- Using an old lease state structure for an active lease could cause the server to expire it early upon restarting, and subsequently offer it to a different client, causing an IP address conflict and taking both clients off the network.
- In particular with Failover, the protocol channel does not resynch the lease database, it works on the assumption that the peer's recorded lease state hasn't changed unless there is a protocol level message adjusting state, so a "partially recovered" lease database actually creates lease database inconsistency between the peers.
If you are not in a failover situation, then you are at least better off having a partially recovered lease database than by starting with a completely empty database; it reduces the chances of addressing conflicts.
However, if you have a server failure and lose the most recent version of the leases file in a failover situation, then it's actually preferable to "fault" the lease database and rely on the partner to have maintained a complete database. The only downside this has on restart is having to wait through MCLT delays associated with that operation.
In a failover pair, this means restarting one of the pair with the leases file removed so that the restarting partner has to re-synchronise from scratch - this can take some time but it does ensure that there are no inconsistencies.
We would recommend the use of RAMdisk or other non-recoverable media for the lease database as a temporary measure only e.g. for diagnostic purposes only to confirm the source of a performance problem or bottleneck, or for an interim solution while waiting on the installation of battery-backed RAID or other sync-rate performance storage media.
Even then, for an interim solution, it might be preferable instead to raise the lease-time if that is administratively permitted (and won't needlessly starve the lease pools), as this willdirectly lower the load placed on the servers and may bring them below the the storage performance limit.
© 2001-2017 Internet Systems ConsortiumFor assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.