Tuesday, June 23, 2009

Effect on WCF web service performance of the Tcp window size in high-latency Windows networks

In my last post I looked at the effect of a bug in WCF on Windows XP and W2K3, in which the Tcp connection used by web services that use TransferMode "Streamed" or "StreamedRequest" is closed after every call. This forces subsequent calls to re-establish the Tcp and, if in use, SSL connection(s) at the start of every call, which can have serious performance consequences.


TCP Window Size and Performance on High-Latency Networks

As a follow-up, I'd like to look at the effect of adjusting the Tcp window size on performance. This and this do an excellent job of describing what the Tcp window is, how it affects Tcp traffic in high-latency, high-bandwidth environments, and how to set it. I won't try to duplicate all that here, except to quote as a brief overview two snippets from this MSDN article on it:
TCP window size

The TCP receive window size is the amount of receive data (in bytes) that can be buffered during a connection. The sending host can send only that amount of data before it must wait for an acknowledgment and window update from the receiving host. The Windows TCP/IP stack is designed to self-tune itself in most environments, and uses larger default window sizes than earlier versions.
....

Windows scaling

For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and is limited to 2 bytes, or a window size of 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. TCP window scale is an option used to increase the maximum window size from 65,535 bytes to 1 Gigabyte.

The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14.

To calculate the true window size, multiply the window size by 2^S where S is the scale value.
For Example:

If the window size is 65,535 bytes with a window scale factor of 3.
True window size = 65535*2^3
True window size = 524280
Increasing this setting in high-latency, high-bandwidth environments can help improve performance because it helps prevent the sender from delivering all the bytes in its buffer before the receiver has a chance to acknowledge receipt of the packets. When that happens the sender has to pause until the ACKs start coming in, resulting in a herky-jerky, slow connection.


What the Tcp Window Size Affects

One thing I didn't understand about the Tcp Window Size setting until I conducted these tests is that it is a receive buffer. When configured on machine A, it tells any potential sending machine B the size of the buffer it should use when sending data to machine A. This was unintuitive to me. Also, machines A and B can use different Tcp window sizes during the same connection. Machine A uses the buffer size configured on machine B when machine A sends data to B, and machine B uses the buffer configured on machine A when machine B sends data to A.

Testing the Affect of Various Tcp Window Sizes

The following tests were conducted with a WCF service host running on Windows XP, with a Vista client. The Tcp window size can't be configured on Vista (or W2K8) the way it can on XP and W2K3; Microsoft has replaced it with Tcp Window Auto-tuning, which they believe renders the TcpWindowSize registry setting obsolete. We'll take a look at that as well. The tests were all conducted using 250ms latency simulated using the WANemulator, on a wireless 8011n network with approximately 30 Mbits of bandwidth.

The tests use essentially the same simple WCF client and host application that I used in my last post, except that I now have 3 web services instead of 1:
  • GetStringData: A simple web service that takes a string parameter and returns it in its return value. I run two tests with this service, one in which I transfer a small "Hi mom" string back and forth, and one in which I transfer a larger string, a 277KB JPEG of my daughter:
My daughter Kaitlin
  • GetStreamedData: A web service that gets a Stream from the server.
  • PutStreamedData: A web service that sends a Stream to the server.
For these tests I'm using GetStreamedData and PutStreamedData to transfer a 3.9 MB movie of a bear:


A bear

Both the picture of Kaitlin and the bear video are already compressed, so there's no worry about anyone else compressing them again and skewing the results.


The Test WCF Host Application

As before, the WCF host, which is running on the XP machine, is a simple console app:


Which hosts this service:


That uses the following, simple bindings:



The Test WCF Client Application

The client uses similar bindings:




And is also a simple console app:









Results of the Tests - XP

I exercised all of the web service calls against an XP machine that was configured to use Tcp Window scale modes of 0, 1, 2 and 3. A scale mode of "0" means Tcp1323Opts (the registry setting that enables Tcp Window sizes greater than 64K) is disabled, and the TcpWindowSize is the default value of roughly 64K. Scale mode 1 means TcpWindowSize is roughly 2^1 x 64K, scale mode 2 means TcpWindowSize is roughly 2^2 x 64K, etc.

The results for the PutStreamedData call are as follows:

Seconds per PutStreamedData web service call, by TcpWindowSize scale mode. Click on the image to open it in a larger window.

This graph shows the amount of time each PutStreamedData web service call takes, based on the TcpWindowSize scale mode configured on the XP machine.

The three streamed transfer mode bindings perform the worst, with streamed NetTcp the worst of all. This matches what I found in my last post when I tested large string-based payloads (I was too lazy in that post to test web services that use Stream parameters). Binding performances get significantly better with a TcpWindowSize scale mode of 1, and after that there's no improvement. The other binding types all start out at about the same performance level, except for buffered NetTcp, which is the best of all. But they separate from one another as the TcpWindowSize is increased, and ultimately buffered BasicHttp and WS2007 perform the best, buffered NetTcp comes in second, and buffered BasicHttp with SSL comes in as the worst of the rest, but still much better than the streamed bindings. This differs from what I saw in my last post, where buffered NetTcp was the runaway performance king in all tests.

Remember that the graph above represents streamed data moving from a Vista machine to an XP machine. The TcpWindowSize has been configured as a registry setting on the XP machine. During the Tcp handshake, it has been passed from the XP machine to the Vista machine, which uses it to determine how many bytes it can send at a time, before receiving ACKs from the XP machine.


Results of the Test - Vista

Now let's look at how data that's sent from XP to Vista looks:

Seconds per GetStreamedData web service call, by TcpWindowSize scale mode on the XP machine

Here we see the bindings sorting themselves into two groups. The results are much more variable when sending data from XP to Vista than they were in the other direction, but doesn't appear to vary based on the TcpWindowSize that was configured on the XP machine. This validates the idea that Vista is relying on Tcp Window Auto-tuning, and not the TcpWindowSize configured on the host machine, to optimize the connection.


The Effect of Increased Tcp Window Sizes in a Packet Trace

The fact that the Vista machine and XP machine are using different Tcp window sizes can easily be seen in this Wireshark packet trace, taken before any TcpWindowSize registry setting was made:

Packet trace of a GetStreamedData call showing the different Tcp window sizes requested by Vista and XP


Packet trace of a GetStreamedData call showing the window size adjusting upward as the call proceeds



Packet trace of a PutStreamedData call showing the default 64KB window size being used when moving data from Vista to XP

The alert reader may notice that the Tcp session didn't end between the two web service calls. That's because for this test I changed the client code to call both web service calls on the same connection, without closing the connection between calls.

In my testing the Tcp window requested by Vista always had a scale mode of 2. This MSDN blog post says that Vista determines the scale mode based on the bandwidth delay product (bandwidth times latency). Since my tests were all conducted with the same bandwidth and latency, it's not surprising that Vista chose the same scale mode throughout.

The effect of the smaller Tcp window size used by default on XP can be seen by running a few queries in Wireshark. The total number of bytes transferred by the GetStreamedData and PutStreamedData calls are nearly identical, 5394699 v. 5394714. They are, after all, transmitting the same file across the network. But if you set the Wireshark filter to search for packets that arrived 0.2 seconds or more after the previous packet (frame.time_delta > 0.2), you see that only 3 such instances occurred during the GetStreamedData call (the one that used the larger Tcp window), while 20 occurred during the PutStreamedData call. Here's one of them:


Packet trace showing communication pausing due to a full buffer

These quarter-second pauses (equal to our round trip latency time) represent moments when the sender has completely filled its buffer and must wait for an ACK before continuing. And the discrepancy in frame time deltas is even more widespread than these 20 incidents: there are 48 deltas greater than 0.1 in the PutStreamedData communication to only 35 for GetStreamedData; 132 greater than 0.05 for PutStreamedData to 51 for GetStreamedData; 421 greater than 0.01 for PutStreamedData to 89 for GetStreamedData, and 679 greater than 0.001 for PutStreamedData to 430 for GetStreamedData. These pauses indicate delays in the communication caused by the sender's buffer momentarily filling up.


Summary of Findings on Stream Parameter Types

So to list some take-aways from the two graphs above:

Based on the PutStreamedData graph (Vista to XP data transfer)
  • It would appear that WCF on XP has particularly poor performance when streamed transfer modes are used. The buffered transfer modes out-perform their streamed counterparts by 50%-75%
  • For buffered BasicHttp, SSL doesn't add any appreciable overhead in the default case, but when the Tcp window size is optimally configured buffered BasicHttp is 50% faster than buffered BasicHttp with SSL
  • Under the default configuration, buffered NetTcp performs best, but when optimally configured buffered BasicHttp and WS2007 perform about 40% better than buffered NetTcp.
Based on the GetStreamedData graph (XP to Vista data transfer)
  • WCF on Vista appears to have particularly poor performance when SSL is used. The two bindings that use SSL perform worse than their non-SSL counterparts by 70%-75%. This is for me the most surprising finding of this entire exercise, because there was no indication that this would be the case in the large string-based payload tests that I conducted in my last post.
  • Streamed NetTcp is the next worst-performing binding, about 50% worse than buffered NetTcp. No surprise there as that binding has consistently performed terribily in all the testing I've done.
  • The other bindings perform roughly the same, about 50% better than the three worst-performing ones.

Mysteriously Poor Performance of SSL on Vista

As to why SSL-based bindings perform so poorly on Vista I'm at a loss to say, other than to note that I see similar-looking communications pauses in the SSL-based Wireshark traces. In one example, comparing a GetStreamedData call made by buffered BasicHttp and buffered BasicHttp with SSL:
  • 6554 packets transfered in the non-SSL case, 6704 for SSL
  • Pauses greater than 0.01 seconds: 194 in the non-SSL case, 225 for SSL
  • Pauses greater than 0.1 seconds: 17 in the non-SSL case, 157 for SSL
  • Pauses greater than 0.2 seconds: 5 in the non-SSL case, 38 for SSL
This certainly bears further investigation in a following post, which I'll get to unles I get distracted by a bright, shiny object.


Findings for String Parameter Types

The performance chart for the large string-based payload test is as follows:

Seconds per web service call, large string-based payload, by Tcp window scale mode

With the exception of streamed NetTcp, there's a slight performance gain seen in all of the bindings, ranging from about 15% - 25%, when larger Tcp window sizes are used. Overall, the results are the same as the findings of my last post: streamed netTcp is the worst-performing binding, followed by the two streamed BasicHttp bindings, followed by buffered BasicHttp with SSL, followed by the rest, with buffered NetTcp the best performing of the bunch.

And the small string-based web service payload test gives similar results to the ones in my last post: streamed BasicHttp bindings perform worst due to the WCF bug that I talked about in my last post, buffered NetTcp performs best, and the rest are in the middle:


Seconds per web service call, small string-based payload, by Tcp window scale mode

In these tests the TcpWindow has no detectable effect on performance at all.


Conclusion

If any words of wisdom come out of these last two posts, it's that all bindings are not alike, and if you want your WCF application to perform well, you had better give careful attention to the bindings you use in your application.

Followers