Question : Problems with NTFRS (File Replication Service)

I’m having a rather big problem which I can’t figure out. The problem is File Replication Service. PLEASE READ ON! 

I’m running a school network domain, with over 600 users. There are about 10 servers running Windwos Server 2003 Std., 2 of thsoe are Domain Controllers.

We had some rather unfortunate problems, we lost the power (with faling UPS!!!!). This resulted in the NTDS database being corrupted on BOTH DCs. NETLOGON and SYSVOL disapeared on both domain controllers.

My PDC is called Ares and my other DC is Zeus.

I read a MS article about temporarily stabilized SYSVOL. I had to Edit “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters” and change DWORD SYSVOLREADY to 1 or 2, making NETLOGON and SYSVOL reappear. And then coping all the scipts and policies in the %systemroot%\SYSVOL.

This attemp did make NETLOGON and SYSVOL reapear, but the FRS still reported problems (Ares is having problems with ZEUS and visaversa). I also found several Experts-exchange links, but none that helped me.

I also used an MS article on how to rebuild the SYSVOL tree and its content and re-creating junctions, but without any fixes to my problem. http://support.microsoft.com/default.aspx?scid=kb;en-us;315457

I then took a backup of my %sytemroot%\SYSVOL. I then demoted ZEUS to a normal member server, and promoted it again. I then got EventID 13508 (http://www.eventid.net/display.asp?eventid=13508&eventno=349&source=NtFrs&phase=1), and event ID 13565.

After promoting ZEUS again, I demoted ARES, but this did not fix anything. I still got the same problems. On Both domain controllers event logs saying that %servername% can’t become a DC before SYSVOL and NETLOG has been replicated (event ID 13565), followed by event log 13508 (same as above). It told me to way till it had replicated troughout the domain (which it always says when adding a new DC), I let the server alone for 24 hours, but only returned to get 13508 event erros. NETLOGON and NTFRS services have no problems starting.
I’m using DELL PowerEdge 2600 & 2800 servers, so the hardware is no problem.

The strange part, while having this FRS problems, demoting and promoting the DCs didn’t report any errors. After promoting, the servers did not have any problems adding a replicaset in “Sites and Trusts” under NTDS, and when forcing a replication the message “%servername% has replicated its connections” returns.

I also tested AD by adding users on Ares, and checking on ZEUS. This worked. I then disabled the users and change the properties on ZUES and checked on ARES. This worked. No problems transferring FSMOs etc. either.
It seems that I’m only having a problem with Replication on “file-level” if can call it that. FRS!

I also tried some tools located on the server 2003 CD, NLTEST, DcDiag and NetDiag.

DcDiag returning some DcGetDcName and some Adverticing problems on Ares, and ZEUS returing FRS problems.

After I demoted Ares, Zeus became my PDC, which isnt a big deal, that will be correted WHEN or IF I get this up and running again.

There are shouldn’t be any DNS problems in my domain. I can ping names and FQDNs without problems. I believe all the correct SRV, A host and other records are in the DNS. 99,99999999% sure! :) Checked this with a hundred times!

I’m an educated IT-professional, but I’m down on my knees with this one. I need help. I can’t afford to do a clean install with over 600 users on the domain! No way! :)

Sincerly,
Annfinn Thomsen

Sorry for this long and boring essay! Plz ask if you don’t understand part of the description! :)

Answer : Problems with NTFRS (File Replication Service)

Making that change to role back to 2K may work, maybe not.   It depends on how old the info is, AD doesn't like jumping backwards in data too often, the further back, the more it doesn't like it.

At this point, it would be better to export everything.  There are a few commands used for exporting, you can use the GUI versions, or there is also LDIFDE.exe which is a tool for bulk export/import of AD objects.  

http://support.microsoft.com/kb/q237677/
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/ServerHelp/32872283-3722-4d9b-925a-82c516a1ca14.mspx

Please note, it cannot get the password, so, all user and computer accounts would have to be changed.  Computers you may need to visit to complete this.  You can use commands like dsmod and dsquery to change the passwords after the fact, or you can use ADModify to do it with a gui.  Or you can use ldifde to set it when they are being imported.

GPEdit MMC can export your group policies, including your default domain policy.  

I think that's what you need.  You can also backup/export the DNS reocrds, but it would probably be safer to start fresh for DNS, if DNS is AD intergrated.

All file permissions, etc on file servers, etc will have to reset.

What to do.



1) Choose a computer, I just selected Ares, just cause.
2) Export all objects from Zeus and save else where.
3) Ghost Ares (just cause to have a backup)
4) Install new UPS and Format and re-install Windows 2003 on Ares
5) Patch, and configure Ares
6) UNPLUG ZEUS (so dcpromo can't find it)
7) Run dcromo and configure your server with dns, etc
8) import all objects
9) set passwords, rejoin a few computers to the new domain if needed
10) test
11) test
12) test
13) when you have tested with a number of accounts, computers, etc that things look right rejoin the rest of the computers to the network
14) Ghost Zeus
15) Format and re-install WIndows 2003 on Zeus, patch and configure
16) Dcpromo and setup Zeus, ie: DNS
17) Test, test, test
18) Invest in a backup system for AD.  This could have been a lot worse...  Even if you just use NTBackup, backup the system state and save it on another server, and burn to a CD.
19) Test your backups.
20) Go on vacation.   :)

As for the P.S. AD is fairly resilient it design, party because the whole backend is in LDAP.  It's not too often that people have two servers that loose FRS, and have everything else working, and even less rare to have a good backup.  AD does keep some minor abilities to restore on the machines, but not much.


Last summer, I had a worse disaster than this, I called Microsoft for help, explained the problem and what I wanted to do to correct it.  They laughed at me, told me it was impossible, and then wished me luck.  Three days and 2 long nights later, I had things back up in a working state.

Sorry it's gotta be this...

GL with the restore.

~CH

Random Solutions  
 
programming4us programming4us