Question : DNZ zone file corruption

Hi All

For the last couple of days I've been having DNS problems in the afternoon. It happened a couple of weeks ago too. We are a branch office.  Head office owns 2 domains: zepcom.com and zepcom.net

At our office we are a separate forest/domain. We have a 3 way VPN between us, NZ & the states.

We have an internal DC which is running 3 zones

1. mel.aus.local (internal mail etc.)
2. zepcom.com
3. zepcom.net

For some strange reason (maybe link cost) our mail goes to a server in the states first over the vpn.  This server is riddler.zepcom.com 192.168.200.14  

We've got one giant ISP dns server and 2 of our ISP's dns servers set as forwarders.  Default dns time out is 5 sec surely it would have got to 2 of those.

Our ISP does dns proxying.

It's not defined in any of our local zones but if we can ping mail.zepcom.net 192.168.100.15 (NZ) mail works.  If we can't it doesn't.

Now the error I'm getting is the dns resolution for both external zones goes down along of course with our mail and intranet.  I delete and re-create a record manually, do a flush dns on the server and the client and everything works again.  When it's hosed you ping things by name and get weird ip addresses.

The only error  (at 3.37 today) on the server dns log is:

The DNS server was unable to complete directory service enumeration of zone ..  This DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it.  Check that the Active Directory is functioning properly and repeat enumeration of the zone. The event data contains the error.

In the system log there's a couple of time related warnings over the last few days:

The Windows Time Service was not able to find a Domain Controller. A time and date update was not possible.

AND

Because of repeated network problems, the time service has not been able to find a domain controller to synchronize with for a long time. To reduce network traffic, the time service will wait 960 minutes before trying again. No synchronization will take place during this interval, even if network connectivity is restored. Accumulated time errors may cause certain network operations to fail. To tell the time service that network connectivity has been restored and that it should resynchronize, execute "w32tm /s" from the command line.

We don't have a huge number of servers, So what I'm suggesting is:

1.  Delete the zones off  the DC (obviously after hours)
2.  Make the zones all standard primary (I trust my Ex more than I do AD)
3.  Install a timesync util on all the servers

Questions:

Should all the servers timesync off the net or just the DC and then the rest timesync with it (and how to do that if so)?

What the heck's going on or at least how to isolate the problems?

Any good real worlk links on nslookup - used to have a great article from techrepublic but don't have it any more.

Any chance that the internal stuff is co-incidence and my ISP has old server addresses cached and how to check that?

Much TIA


Answer : DNZ zone file corruption

First thing I'd do is run DC diag and figure out why AD is not available at times.

If your NTP server is unavailable, either you have a link down, a routing problem, a flaky NTP server or you've specified a FQDN as your HTP server and can't resolve it. Follow this trail until you resolve it and it may reveal where some problem lie.  

The mail servers you refer to are in private ip address space but are in real in-arpa domain... is your ISP using inverse NAT and private IP?

Nothing but your internal zone should be on your DC. If you have multiple internal zones and nameservers, you should consider doing zone transfers from the authority rather than relying on cache or forwarding. As for external zones, caching is fine. Forwarding just created unnecesary trafic.

You say this happens "in the afternoon". Are you watching your links for saturation and your server for load?



Random Solutions  
 
programming4us programming4us