I have been where you are many times.... I have promoted and demoted hundreds of DCs, and I have to tell you, there IS magic involved and some luck... or at least it always seems that way.
Here has been my experience and my 10 rules surrounding DCPROMO
1. NEVER replace a dead DC server with a server with the same name! You can NTDU Util until your fingers fall off, but there will always be some record on some late replicating DC somewhere that is going to screw things up for you!
2. NEVER DCPROMO a box that doesn't have DNS already installed
3. NEVER create a zone on a DNS server before you promote it... you will NEVER get it to sync correctly.
4. NEVER try to PCPROMO a box that doesn't have it's NIC pointing to your PDC emulator or RIDS server.
5. NEVER configure a newly promoted DC's NIC to point to itself for DNS- for several hours or until the NTFRS logs shows complete replication.
6. NEVER, EVER, EVER, do a force removal. If the box can replicate ANYWHERE, it will throw ghost GUIDs out to another DC and when it replicates back it's other partner will see changes that were already committed and refuse replication. You were right on the money when you used the NTDS utility. I think the official Microsoft word for it is "lingering objects" (see below)
7. NEVER, EVER, NEVER, EVER, EEEVVVEEERRRR- bring up a DC on it's own and make changes to it before it has a chance to do it's sanity checks and clear FRS, NTDS, and DNS. If it is the only server it can talk to, it could take an HOUR, yes, AN HOUR before you can make any changes that will not royally screw up your domain. This has to be to guard against the potential DNS island problems that can occur when you have a circuit failure or something akin to that...
My thinking is you never had a good sync with the second server to begin with... yes, I know, you say I am nuts, but too many times I have seen the DCPromo process complete FULLY and had a totally worthless DC that never fully replicates, or who's DNS server duplicates or drops DNS host and server records. If you happen to have event viewer logs from the servers, I would be happy to take a look at them for you and tell you exactly why it happened.
Are you sure you weren't on the second box trying to transfer the roles? If you wanted the roles to be on the second server, you would have had to have been on it, otherwise it would not have show the server you wanted in the box below the CHANGE button. The make you pull the role you want over from the existing role holder to the new one.
Most likely the servers were looking at each other as corrupt partners or at least one was- which would explain why you were able to transfer everything to the new box during DCPromo and never get any love back.
I have also run into situation where I am absolutely sure all DCs are cool, yet I still get a replication error due to "lingering objects". If you are 100% sure you don't have a DC that has experienced a USN rollback of some kind, you can clear that error by changing the registry on the DC that is throwing that error:
Stop the Netlogon service
Cruise to:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters
Add a REG_DWORD called:
"Allow Replication With Divergent and Corrupt Partner"
and set the decimal value to 1
Start Netlogon
I have to warn you though, if you do have a situation were some goof-ball has restored some objects and that is the cause of your problem, you will have made the problem considerably worse. Not in a 2 DC situation, but in one with 100 DCs in 20 states, you will be chasing down 40 or 50 servers with the lingering objects message by the end of the next day...
Rest assured and take great comfort in knowing this is one of the very hardest experiences in all of IT. Second only perhaps to SMS and SCCM troubleshooting. The good news is that in both cases the answers are right there in front of you. However, it seemed to me for YEARS to be AES encrypted. After paying a worthless "expert" 6 grand to come in and fix our AD problems, I was forced to learn way more than I ever wanted to...
Hope this helps,
Captain Clam
P.S. ADSI Edit and the System tree of ADUC is extremely helpful in determining problems like this too. ADSI allows you to see the USN, GUID and object information as well as resource registrations... the other tool that you HAVE to get to know if you get stuck doing this again is LDP. Finally, if you end up with a bunch of DCs and the crap is hitting the fan again, download an eval of "Spotlight on Active Directory". You can't possibly believe how much easier your life will become... it makes it almost embarrassingly simple to find problems and it has all kinds of cool graphics and dynamic images that show the replication traffic moving and stuff... the boss with think you are running NORAD.