|
DFS Replication issue with Event ID 5014 and 14526
DFS Replication cannot replicate with partner due
to a communication error
DFS Replication failed with error ID: 9032 (The
connection is shutting down). Event ID: 5002
DFS Replication failed with error ID: 1722 (The
RPC server is unavailable.). Event ID: 5008
Event ID: 5014 - DFS Replication service is
stopping communication with partner for replication group due to an error
Event ID: 14526 - DFS could not contact the Active
Directory. DFS will be using cached data
Q: Main site: B and remote branch: C. Replicated folder: "\\server\d$\dfsharet" on both servers.
Both connected via VPN tunnel. Both sites are running windows 2008.
I can ping each site by name. The firewall is off on both servers. I can
telnet port 135 and 139. I can access each other by using remote desktop.
The nslookup looks good. The DFS Replication Service and DFS Namespace
Service are Started.
When trying to cerate a DFS health report, I get this error:
|
Communication errors are preventing replication with B
|
| |
Affected replicated folders:
|
All replicated folders on this server.
|
| |
Description:
|
DFS Replication cannot replicate with partner B due to
a communication error. The DFS Replication service used partner DNS
name B.domain.local, IP address 172.16.0.15, and WINS address B but
failed with error ID: 9032 (The connection is shutting down). Event
ID: 5002
|
| |
Last occurred:
|
Thursday, November 04, 2010 at 5:02:53 AM (GMT-6:00)
|
| |
Suggested action:
|
Check for network connectivity problems, for
troubleshooting RPC issues see
RPC
KB 839880 and for additional troubleshooting information, see
The Microsoft
Web Site.
|
|
|
|
|
| |
Communication errors are preventing replication with
partner B.
|
| |
Affected replicated folders:
|
All replicated folders on this server.
|
| |
Description:
|
DFS Replication cannot replicate with partner B due to
a communication error. This error can occur if the host is
unreachable, or if the DFS Replication service is not running on the
server. The DFS Replication service used partner DNS name
B.doamin.local, IP address 172.16.0.15, and WINS address B but
failed with error ID: 1722 (The RPC server is unavailable.). Event
ID: 5008
|
| |
Last occurred:
|
Wednesday, November 03, 2010 at 5:04:10 AM (GMT-6:00)
|
| |
Suggested action:
|
Check for network connectivity and service related
problems. For troubleshooting RPC issues see
RPC
KB 839880 and for additional troubleshooting information, see
The Microsoft
Web Site.
|
I also found these Event IDs.
Log Name: DFS Replication
Source: DFSR
Date: 11/3/2010 5:04:10 AM
Event ID: 5008
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: C.domain.local
Description:
The DFS Replication service failed to communicate with partner B for
replication group Dfsharet. This error can occur if the host is unreachable,
or if the DFS Replication service is not running on the server.
Partner DNS Address: B.domain.local
Optional data if available:
Partner WINS Address: B
Partner IP Address: 172.16.0.15
The service will retry the connection periodically.
Additional Information:
Error: 1722 (The RPC server is unavailable.)
Connection ID: 3D32F565-DDE5-4184-B7F5-832B96EC27BD
Replication Group ID: 34CAE6E2-F48C-45D2-98B4-A5FEA4ECF8AD
Log Name: DFS Replication
Source: DFSR
Date: 11/2/2010 9:00:00 PM
Event ID: 5002
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: C.domain.local
Description:
The DFS Replication service encountered an error communicating with
partner B for replication group Dfsharet.
Partner DNS address: B.domain.local
Optional data if available:
Partner WINS Address: B
Partner IP Address: 172.16.0.15
The service will retry the connection periodically.
Additional Information:
Error: 9032 (The connection is shutting down)
Connection ID: 3D32F565-DDE5-4184-B7F5-832B96EC27BD
Replication Group ID: 34CAE6E2-F48C-45D2-98B4-A5FEA4ECF8AD

A1: After some research, At first I need to mention that, your original
problem could be not too difficult to fix. Though computers in C-site
actually go to file server in B-site, it's just a DFS referral problem..
You mentioned: "however it didn't replicate entirely"
Do you mean that, currently, the "dfsharet" folder doesn't replicate between
two DFS servers?
IF so, then, the issue has become a bigger problem to fix..
Based on the situation, I suggest that we can try to remove the entire
folder from DFS namespace, disable sharing of all "dfsharets" folders on all
DFS servers, rename the names of those share folders, re-share them and
re-create a brand-new folder and its replication-link in DFS namespace.
Q2: After I re-create a DFS and health report doesn't give any
communication error and it looks goods. However, they don't replicate each
other with these Event ID.
In the Server B:
Log Name: DFS Replication
Source: DFSR
Date: 11/6/2010 10:09:40 AM
Event ID: 5014
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: B
Description:
The DFS Replication service is stopping communication with partner C for
replication group domain.local\dfs\dfsharet due to an error. The service
will retry the connection periodically.
Additional Information:
Error: 1722 (The RPC server is unavailable.)
Connection ID: 2015EE93-701E-4D2A-8C03-D5B7BE596BF3
Replication Group ID: 8993FDDC-DB78-4680-B278-5D151F02BAFF
E
In Server C:
Log Name: System
Source: Microsoft-Windows-DfsSvc
Date: 11/6/2010 9:47:24 AM
Event ID: 14526
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: C
Description:
DFS could not contact the B Active Directory. DFS will be using cached
data. The return code is in the record data.
If I run repadmin /showreps, both server show "DsReplicaGetInfo() failed
with status 8453 (0x2105):
Replication access was denied".
I can't access the servers. I will do the mps when I get a chance.
A2: I have found a lot of these errors from your logs:
+ [Error:9053(0x235d) RpcFinalizeContext downstreamtransport.cpp:1117 3924 C
The staging quota was exceeded]
+ [Error:9053(0x235d) Rdc::SyncClientState::Download rdc.cpp:2640 3924 C The
staging quota was exceeded]
+ [Error:9053(0x235d) DownstreamTransport::DownloadFile
downstreamtransport.cpp:5937 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) RpcFinalizeContext downstreamtransport.cpp:1117 3924 C
The staging quota was exceeded]
+ [Error:9053(0x235d) DownstreamTransport::DownloadFileAsync
downstreamtransport.cpp:6444 3924 C The staging quota was exceeded]
+ [Error:9053(0x235d) DownloadWriter::AllocateFileSize meet.cpp:1055 3924 C
The staging quota was exceeded]
+ [Error:9053(0x235d) StageWriter::ReserveSpace staging.cpp:659 3924 C The
staging quota was exceeded]
+ [Error:9053(0x235d) StageWriter::ReserveSpace staging.cpp:651 3924 C The
staging quota was exceeded]
Based on these, I'd like to first share some suggestions as below:
DFSR Prestaging Best Practice
1. Ideally you should not allow users to access files on any member until
initial replication completes on all members.
If users are able to make changes to the data during initial replication
this will increase the time it takes for initial replication to complete.
Also, if a user changes a file on a non-primary member when that member is
still doing initial replication, the file will be overwritten by the version
on the primary member, or it will be moved to the PreExisting folder if
there was no version of that file on the primary member.
Either way this makes for a bad user experience and an administrative
headache.
2. Install hotfix 931685 / 943661 if it is win2003 on all servers
participating in DFS Replication.
It should be applied whether the server is on SP1 or SP2, and both
Enterprise and Standard Edition servers.
3. Prestage the data on the desired servers. Common methods include using
Robocopy, Xcopy, or Windows Backup (NTBackup), or Windows Explorer.
I did not find a preferred method, they all work as long as you follow the
best practices.
Robocopy - works fine as long as you do not use /copyall or /copy:S. As
long as the root of the replicated folder has exactly the same ACL
(including inheritance bits) on both machines, using Robocopy without /copyall
(or /copy:s) will work as expected.
Xcopy - Xcopy with the /O switch will copy the ACL correctly.
Windows Backup (NTBACKUP) - The Windows Backup tool by default will
restore the ACLs correctly (unless you uncheck the Advanced Restore Option
for Restore security setting, which is checked by default).
Windows Explorer - no known issues, however Explorer's file copy is
generally not the fastest as compared to other tools.
NOTE The checked version of DFSR.EXE can be used to verify the hashes are
identical between members.
4. Create the replication group and configure an adequate staging quota
for the
amount of data being replicated.
To change the staging quota size, in DFS Management (dfsmgmt.msc) select
the replication group, click the Memberships tab, double-click each
replicated folder and select the Advanced tab.
Basic recommendations for staging quota:
- If possible change the staging path to a volume on a different spindle
(staging path is configured in the same place as staging quota size)
- Set the staging quota size as close to the size of the replicated folder
as possible. A large staging quota is especially important during initial
replication.
- At the very minimum the staging quota should be equal to the combined size
of the five largest files in the replicated folder. (You’ve received the
rules of thumb earlier).
- Increase the staging quota by 50% if you see event 4202 and 4204, or 4208
and 4206, this was exactly your approach J.
Additional information:
http://blogs.technet.com/filecab/archive/2006/03/20/422544.aspx
5. Wait for initial replication to complete.
First, on the primary member you will see a 4112 indicating it is the
primary member. Then on the non-primary you will see a 4102 (initial
replication started) and then a 4104 once initial replication completes. You
can monitor the progress using DFSR performance counters.
If the files were prestaged properly the file hashes will be identical and
DFSR will replicate a minimal amount of information over the network to
indicate the files are the same.
If the file hashes are not identical (for example if robocopy /copyall
was used to prestage or if they files have simply just changed after they
were prestaged), all or nearly all the files will be in conflict resulting
in a flood of 4412 events on the downstream (non-primary) member.
DFSR will reuse the bits from the conflict loser (sitting in
ConflictAndDeleted) to prevent replicating the entire conflict winner over
the wire.
However if hotfix 931685 has not been applied, the entire conflict winner
will be replicated over the wire. Also, the 660 MB default quota size for
the ConflictAndDeleted folder will quickly be reached in most cases where
most of the replicated folder is in conflict because of incorrect prestaging.
You can configure a larger ConflictAndDeleted quota size but the focus
should always be on prestaging properly so we do not see excessive file
conflicts during initial replication.
A2: It is working now. What I did
1. Re-created DFS. Please refer to this link:
DFS Replication How to
2. Troubleshed the DFS replication issue in the WAN link: please
refer to this post: DFS Replication just stops
3. Prestage the data using robocopy.
4. Re-created the replication group and configure an adequate staging
quota for the amount of data being replicated. Refer to this link:
DFS Replication How to
5. Restarted the DFS services.
6. Run DFS health report and make sure not errors.
Post your questions, comments, feedbacks and suggestions
Contact a consultant
Related Topics
|