We have put a DNS server online running DJBDNS v1.06
(ndjbdns-1.06-1.el6.x86_64) on a 64-bit CentOS 6.6 server. We have done some limited testing on the machine which it passed – i.e., dnscache was talking to tinydns, the queries went through fine, etc.
As soon as we put it online subjecting it to live load the following happened:
1) Within a short time period (about a minute) the dnscache process reached the CPU utilisation level of 100%.
2) The process would then die reporting the following message to the log:
dnscache: BUG: out of in progress slots
NOTE: Random sampling indicates that at no point sampled did the load exceed 200 requests per second. In tests conducted earlier the DNS server successfully demonstrated speeds in tens of thousands of requests per second.
We then proceeded to edit the following parameters in the dnscache.conf as they seemed to be the only ones that seemed relevant: DATALIMIT and CACHESIZE. They are described as limints (in bytes) on the total data memory allocation and cache, default values are 80000000 and 50000000
Playing with these demonstrated some highly counterintuitive results:
1) Setting the values lower (say, an order of magnitude lower) made the dnscache process run longer.
2) Shortening the relative gap between the two values (for instance, setting DATALIMIT at 52000 and CACHE at 50000) made it run for about an hour vs about 1 minute, load seeming to be about the same.
3) Running it with DATALIMIT not set was possible though it eventually failed anyways.
4) Running it with CACHESIZE not set was not possible at all.
So the issue is currently still not resolved and we are stuck.
Any advice will be much appreciated.