DFS Replication issue with Event ID 5014 and 14526

DFS Replication issue with Event ID 5014 and 14526

DFS Replication cannot replicate with partner due to a communication error

DFS Replication failed with error ID: 9032 (The connection is shutting down). Event ID: 5002

DFS Replication failed with error ID: 1722 (The RPC server is unavailable.). Event ID: 5008

Event ID: 5014 - DFS Replication service is stopping communication with partner for replication group due to an error

Event ID: 14526 - DFS could not contact the Active Directory. DFS will be using cached data

Q: Main site: B and remote branch: C. Replicated folder: "\\server\d$\dfsharet" on both servers. Both connected via VPN tunnel. Both sites are running windows 2008.

I can ping each site by name. The firewall is off on both servers. I can telnet port 135 and 139. I can access each other by using remote desktop. The nslookup looks good. The DFS Replication Service and DFS Namespace Service are Started.

When trying to cerate a DFS health report, I get this error:

Communication errors are preventing replication with B
	Affected replicated folders:	All replicated folders on this server.
	Description:	DFS Replication cannot replicate with partner B due to a communication error. The DFS Replication service used partner DNS name B.domain.local, IP address 172.16.0.15, and WINS address B but failed with error ID: 9032 (The connection is shutting down). Event ID: 5002
	Last occurred:	Thursday, November 04, 2010 at 5:02:53 AM (GMT-6:00)
	Suggested action:	Check for network connectivity problems, for troubleshooting RPC issues see RPC KB 839880 and for additional troubleshooting information, see The Microsoft Web Site.


	Communication errors are preventing replication with partner B.
	Affected replicated folders:	All replicated folders on this server.
	Description:	DFS Replication cannot replicate with partner B due to a communication error. This error can occur if the host is unreachable, or if the DFS Replication service is not running on the server. The DFS Replication service used partner DNS name B.doamin.local, IP address 172.16.0.15, and WINS address B but failed with error ID: 1722 (The RPC server is unavailable.). Event ID: 5008
	Last occurred:	Wednesday, November 03, 2010 at 5:04:10 AM (GMT-6:00)
	Suggested action:	Check for network connectivity and service related problems. For troubleshooting RPC issues see RPC KB 839880 and for additional troubleshooting information, see The Microsoft Web Site.

I also found these Event IDs.

Log Name: DFS Replication

Source: DFSR

Date: 11/3/2010 5:04:10 AM

Event ID: 5008

Task Category: None

Level: Error

Keywords: Classic

User: N/A

Computer: C.domain.local

Description:

The DFS Replication service failed to communicate with partner B for replication group Dfsharet. This error can occur if the host is unreachable, or if the DFS Replication service is not running on the server.

Partner DNS Address: B.domain.local

Optional data if available:

Partner WINS Address: B

Partner IP Address: 172.16.0.15

The service will retry the connection periodically.

Additional Information:

Error: 1722 (The RPC server is unavailable.)

Connection ID: 3D32F565-DDE5-4184-B7F5-832B96EC27BD

Replication Group ID: 34CAE6E2-F48C-45D2-98B4-A5FEA4ECF8AD

Log Name: DFS Replication

Source: DFSR

Date: 11/2/2010 9:00:00 PM

Event ID: 5002

Task Category: None

Level: Error

Keywords: Classic

User: N/A

Computer: C.domain.local

Description:

The DFS Replication service encountered an error communicating with partner B for replication group Dfsharet.

Partner DNS address: B.domain.local

Optional data if available:

Partner WINS Address: B

Partner IP Address: 172.16.0.15

The service will retry the connection periodically.

Additional Information:

Error: 9032 (The connection is shutting down)

Connection ID: 3D32F565-DDE5-4184-B7F5-832B96EC27BD

Replication Group ID: 34CAE6E2-F48C-45D2-98B4-A5FEA4ECF8AD

A1: After some research, At first I need to mention that, your original problem could be not too difficult to fix. Though computers in C-site actually go to file server in B-site, it's just a DFS referral problem..

You mentioned: "however it didn't replicate entirely"
Do you mean that, currently, the "dfsharet" folder doesn't replicate between two DFS servers?

IF so, then, the issue has become a bigger problem to fix..

Based on the situation, I suggest that we can try to remove the entire folder from DFS namespace, disable sharing of all "dfsharets" folders on all DFS servers, rename the names of those share folders, re-share them and re-create a brand-new folder and its replication-link in DFS namespace.

Q2: After I re-create a DFS and health report doesn't give any communication error and it looks goods. However, they don't replicate each other with these Event ID.

In the Server B:

Log Name: DFS Replication

Source: DFSR

Date: 11/6/2010 10:09:40 AM

Event ID: 5014

Task Category: None

Level: Warning

Keywords: Classic

User: N/A

Computer: B

Description:

The DFS Replication service is stopping communication with partner C for replication group domain.local\dfs\dfsharet due to an error. The service will retry the connection periodically.

Additional Information:

Error: 1722 (The RPC server is unavailable.)

Connection ID: 2015EE93-701E-4D2A-8C03-D5B7BE596BF3

Replication Group ID: 8993FDDC-DB78-4680-B278-5D151F02BAFF

In Server C:

Log Name: System

Source: Microsoft-Windows-DfsSvc

Date: 11/6/2010 9:47:24 AM

Event ID: 14526

Task Category: None

Level: Warning

Keywords: Classic

User: N/A

Computer: C

Description:

DFS could not contact the B Active Directory. DFS will be using cached data. The return code is in the record data.

If I run repadmin /showreps, both server show "DsReplicaGetInfo() failed with status 8453 (0x2105):

Replication access was denied".

I can't access the servers. I will do the mps when I get a chance.

A2: I have found a lot of these errors from your logs:
+ [Error:9053(0x235d) RpcFinalizeContext downstreamtransport.cpp:1117 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) Rdc::SyncClientState::Download rdc.cpp:2640 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) DownstreamTransport::DownloadFile downstreamtransport.cpp:5937 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) RpcFinalizeContext downstreamtransport.cpp:1117 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) DownstreamTransport::DownloadFileAsync downstreamtransport.cpp:6444 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) DownloadWriter::AllocateFileSize meet.cpp:1055 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) StageWriter::ReserveSpace staging.cpp:659 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) StageWriter::ReserveSpace staging.cpp:651 3924 C The staging quota was exceeded]

Based on these, I'd like to first share some suggestions as below:
DFSR Prestaging Best Practice

1. Ideally you should not allow users to access files on any member until initial replication completes on all members.
If users are able to make changes to the data during initial replication this will increase the time it takes for initial replication to complete.
Also, if a user changes a file on a non-primary member when that member is still doing initial replication, the file will be overwritten by the version on the primary member, or it will be moved to the PreExisting folder if there was no version of that file on the primary member.
Either way this makes for a bad user experience and an administrative headache.

2. Install hotfix 931685 / 943661 if it is win2003 on all servers participating in DFS Replication.
It should be applied whether the server is on SP1 or SP2, and both Enterprise and Standard Edition servers.

3. Prestage the data on the desired servers. Common methods include using Robocopy, Xcopy, or Windows Backup (NTBackup), or Windows Explorer.
I did not find a preferred method, they all work as long as you follow the best practices.

Robocopy - works fine as long as you do not use /copyall or /copy:S. As long as the root of the replicated folder has exactly the same ACL (including inheritance bits) on both machines, using Robocopy without /copyall (or /copy:s) will work as expected.

Xcopy - Xcopy with the /O switch will copy the ACL correctly.

Windows Backup (NTBACKUP) - The Windows Backup tool by default will restore the ACLs correctly (unless you uncheck the Advanced Restore Option for Restore security setting, which is checked by default).

Windows Explorer - no known issues, however Explorer's file copy is generally not the fastest as compared to other tools.

NOTE The checked version of DFSR.EXE can be used to verify the hashes are identical between members.

4. Create the replication group and configure an adequate staging quota for the
amount of data being replicated.

To change the staging quota size, in DFS Management (dfsmgmt.msc) select the replication group, click the Memberships tab, double-click each replicated folder and select the Advanced tab.

Basic recommendations for staging quota:
- If possible change the staging path to a volume on a different spindle (staging path is configured in the same place as staging quota size)
- Set the staging quota size as close to the size of the replicated folder as possible. A large staging quota is especially important during initial replication.
- At the very minimum the staging quota should be equal to the combined size of the five largest files in the replicated folder. (You’ve received the rules of thumb earlier).
- Increase the staging quota by 50% if you see event 4202 and 4204, or 4208 and 4206, this was exactly your approach J.

Additional information:
http://blogs.technet.com/filecab/archive/2006/03/20/422544.aspx

5. Wait for initial replication to complete.
First, on the primary member you will see a 4112 indicating it is the primary member. Then on the non-primary you will see a 4102 (initial replication started) and then a 4104 once initial replication completes. You can monitor the progress using DFSR performance counters.
If the files were prestaged properly the file hashes will be identical and DFSR will replicate a minimal amount of information over the network to indicate the files are the same.

If the file hashes are not identical (for example if robocopy /copyall was used to prestage or if they files have simply just changed after they were prestaged), all or nearly all the files will be in conflict resulting in a flood of 4412 events on the downstream (non-primary) member.
DFSR will reuse the bits from the conflict loser (sitting in ConflictAndDeleted) to prevent replicating the entire conflict winner over the wire.
However if hotfix 931685 has not been applied, the entire conflict winner will be replicated over the wire. Also, the 660 MB default quota size for the ConflictAndDeleted folder will quickly be reached in most cases where most of the replicated folder is in conflict because of incorrect prestaging.
You can configure a larger ConflictAndDeleted quota size but the focus should always be on prestaging properly so we do not see excessive file conflicts during initial replication.

A2: It is working now. What I did

1. Re-created DFS. Please refer to this link: DFS Replication How to

2. Troubleshed the DFS replication issue in the WAN link: please refer to this post: DFS Replication just stops

3. Prestage the data using robocopy.

4. Re-created the replication group and configure an adequate staging quota for the amount of data being replicated. Refer to this link: DFS Replication How to

5. Restarted the DFS services.

6. Run DFS health report and make sure not errors.

Post your questions, comments, feedbacks and suggestions

Contact a consultant