Sunday, May 24, 2009

WCF Bug on W2K3 and XP: TCP session closes after every web service call when using HTTP transport with Streamed transfer mode

While profiling a WCF-based application recently I uncovered a bug in WCF, in which the WCF service host forcibly closes its TCP session with the client after each and every web service call. The client is helpless to prevent it, even if it keeps the client proxy open in the hope of reusing the session for subsequent calls. No exception happens when the client does make subsequent calls; instead the proxy silently negotiates a new TCP session and continues on its merry way.

A Microsoft representative that I talked to about this confirms that this is an issue in WCF services hosted on Windows 2003 and XP, if they use HTTP transport with transportMode "Streamed" or "StreamedRequest." The bug doesn't affect transportMode "StreamedResponse," and has been fixed on Vista, Windows 2008 and Windows 7.

This bug is potentially a big deal because, as I noted in my last post, initiating a new TCP session requires a synchronous round-trip to the server, which, thanks to this bug, becomes overhead added to each and every web service call. And if SSL is in use then at least one more synchronous round trip will be added to every call. These round trips are detrimental to performance, especially in high-latency environments.


Seeing the Bug in a Packet Trace

The bug is very easy to see in a network packet trace. For these traces I'm using Wireshark, though most any packet sniffer will do. See my last post for instructions on how to use Wireshark.


The Test Web Service

The service I'm using for this test is simply this:


WCF service that demonstrates the bug

You'll notice that this web service method does not have a Stream, Message or IXmlSerializable return type or parameter, which according to this MSDN article is needed to make use of WCF streaming. This post therefore covers what happens to the performance of other (non-streaming) web service methods that are part of a web service that has been configured to support streaming. An evaluation of how the bug affects web service methods that do stream will be the topic of another post, unless, to paraphrase Shrek, I change my mind, or forget.


The Test Server

For the server I used a very unremarkable Console app as the WCF self-hosting service:


Console app used as the WCF service host

With the following basicHttpBinding endpoint:

WCF service host's App.config endpoint configuration


The Test Client

The client just calls the web service twice:

WCF client code used for testing

with Streaming enabled:

WCF client binding specified in its App.config


The Bug Exposed

The bug can be seen by comparing the network traffic produced by the calls shown above using an XP server, with the traffic produced by a Vista server:

Packet trace showing multiple web service calls using the same client proxy, using Vista as the server - Click on the image to view it more clearly

Packet trace showing multiple web service calls using the same client proxy, using XP as the server  - Click on the image to view it more clearly

These packet traces show that when XP is used as the WCF service host, the TCP connection is closed and re-opened between web service calls. This is verified by the fact that on XP each call is made using a different client port, while on Vista the same port is used for each call.


Evaluating the effect of the Bug

In order to evaluate the effect that this bug has on WCF performance, I ran a set of tests, the results of which I'll describe below. The first test was simply to run the simple WCF service shown above using different binding configurations in a loop against XP and Vista, and to note the performance of each binding.


The Test Bindings

The server bindings I used for this initial test are these:

Server bindings tested for performance

I tested custom bindings designed to simulate basicHttpBindings just to verify that the issue affects any http-based transport, and not just basicHttp, but I found that the timings for those bindings were identical to the timings for the basicHttpBindings. From here on out I'm dropping them from these results, to keep the clutter down.


The Test Client

For the client I used a simple console app that calls the web service using each of the client bindings in a loop for 10 seconds, and counts the number of web service calls it's able to make:



Client code used to test the performance of various WCF bindings


The Bug on Low Latency Networks

Here are the results:

Web service calls per second on low latency network, using a small web service payload

This test was conducted on my home network using two computers, an XP machine and a Vista machine. They're connected by an 8011.n wireless network that has about 1 ms round-trip latency.

The first thing that stands out about the results is the remarkable performance of buffered NetTcp, which was able to move twice as many calls per second as the other bindings. The next thing that stands out is the fact that while all the other bindings perform about the same on Vista, on XP the basicHttp-streamed binding is about 25% slower than the others, and basicHttps-streamed is about 50% slower than the others. So even in this low latency test, the overhead of setting up new TCP and SSL sessions for every call has a measurable impact on performance.

Finally, even though the graphs above represent two different machines, the fact that the number of calls each machine is able to process per second is approximately the same gives me some confidence that I'll be able to compare absolute call times between them in the analyses that follow.


The Bug on High Latency Networks

The bug has a measurable effect even in low-latency networks, but the performance numbers shown above are still pretty good. An end-user is not likely to notice the difference between an application that's making 50 web service calls per second v. one that's making 250 calls per second. What is of more interest is how the bug behaves in high-latency networks. 

To simulate different network latency times, I used the WANEmulator, a marvelous tool that literally takes minutes to set up, and which makes an excellent Christmas gift, especially since it's free. Running the same tests with 50ms, 100ms and 200ms round-trip times produced the following results:


Seconds per basicHttp web service call on Windows XP


Seconds per basicHttp web service call on Vista

The results are not surprising. With 200ms round-trip times, streaming added 200ms to the average web service call time, due to the need to make an extra round trip to establish the TCP session. And SSL adds another 200ms to the call for establishing the SSL session. Though it's not readily visible in the packet traces I've got right now, in other traces I have seen that WCF is able to establish SSL sessions in a single round-trip, so again the result above is not surprising.

It's good to see in the results above that on Vista the problem is really fixed, that all four bindings take the same amount of time, and that the amount of time time they take is the faster of the times taken on XP (e.g. 0.4 seconds per call at 200ms).

The bad news in the results above is that latency has a dramatic impact on even the best performance numbers above - from lightning-fast at 1ms latency to 0.4 seconds per call at 200 ms latency. The reason for this is not entirely obvious. Remember that at in the tests above, we are averaging the call time between many calls, and (except for streamed bindings on XP where the bug is restarting the TCP session every time) we are reusing the TCP and the SSL sessions between calls, so the time needed to establish those sessions should hardly be visible in these results. Since my test web service is a two-way call (it doesn't have the IsOneWay operation contract attribute), each call will result in at least one synchronous round trip, but the source of any additional overhead is not obvious.

The further bad news is that when the bug is present, web service calls are again 25% slower in the streaming case, and 50% slower in the streaming+ssl case, than they should be.

Another interesting result appears if I add back the other bindings to the chart:


Seconds per call of various bindings on XP, with various network latency figures

While WS2007 and streamed  NetTcp have nearly identical results as buffered basicHttp, buffered NetTcp again stands out as being twice as fast as all the rest. In fact, its call times are almost exactly equal the round-trip time itself, which is the theoretical minimum.


The Bug with Large Web Service Payloads

Again, I'm not going to analyze the bug when streaming-based parameters are in use, for the very good reason that I don't feel like it right now, but even an unstreamed web service method can include a larger payload than the "hi mom" text I used above. To see the effect of the bug on those types of methods, I changed the client to send a gob of text that's about 277KB in length. To prevent anyone from compressing it, I serialized something that's already been compressed, a JPEG of my daughter:

My daughter Kaitlin

Like so:




Client program that sends a picture of Kaitlin to the server in a loop, using different bindings, and times the result

Naturally I had to increase some of the max size parameters in the client and server bindings to allow for larger messages:



Server bindings that allow for larger message sizes

Client bindings that allow for larger message sizes

In the client bindings above I collapsed some of the duplicate settings to save space.


The Bug on a Low Latency Network with a Large Web Service Payload 

The results are as follows:

Web service calls per second on low latency network, using a large web service payload

The graph has the same general shape as it did with a small payload, though the absolute numbers are much different - 7 or 8 calls per second at best, compared to 250 calls per second with the small payload. With a payload this large network bandwidth is no doubt affecting these numbers, but since bandwidth is fixed throughout these tests, they should work fine for determining the relative effect of latency on WCF.

Again buffered NetTcp is the runaway performance winner, but this time streamed basicHttp performs about as well as the rest, and we have a new loser: streamed NetTcp.

So the bug is not measurable with large payloads in low latency environments. This makes good sense, because each web service call is taking so much more time just to transfer its payload that the TCP and SSL session initiation time is small by comparison.


The Bug on High Latency Networks with Large Web Service Payloads

Logically, in high-latency environments the TCP and SSL session initiation time should continue to be overwhelmed by the sheer amount of time taken to deliver the payload, but let's see what the testing shows:


Seconds per basicHttp web service call on Windows XP with a large web service payload


Seconds per basicHttp web service call on Vista, with a large web service payload

To our shock and horror, we see that the bug appears to affect performance in the same proportion as in the small payload case, only now instead of increasing call times from 0.4 seconds per call to 0.8 seconds per call, it's increasing call times from 2 seconds to 6. 

The good news (if anything about the above charts can be called "good") is that the bug remains more or less fixed on Vista, where call times are uniformly close to the best call times on XP, roughly 3 to 4 seconds per call. The Vista results appear to show that there is some additional overhead involved in streaming large payloads however, which amounts to about an additional 1 second per call in the 200ms latency case.

But even if we assume that, apart from the bug, there's 1 sec. worth of "normal" overhead involved in streaming large payloads in the 200ms latency case, that still leaves 1-2 seconds of overhead that we can only attribute to the bug. It would appear that closing and re-opening the TCP session carries some additional penalty beyond the round-trip TCP session establishment time. Something called "slow start" TCP session tuning may be the culprit (sure sounds like it, don't ya think?), but to date I haven't proven it, mainly due to laziness, since I have the packet traces in which must lie the smoking gun.


Performance of Other Bindings with Large Payloads

Though it has nothing to do with the stated purpose of this post, it's fun to add back the other binding types to the chart above:


Seconds per call of various bindings on Vista, with various network latency figures, using a large web service method payload

While WS2007 performs about as well as the best of the basicHttp methods, buffered NetTcp is again the speed king, coming in at a good 1/3 faster than the best of the rest. 

But what can we say about streamed NetTcp? It's taking 15 seconds per call in the 200 ms latency case, a good 4 times slower than all the rest! Only Dr. Seuss seems adequate: 
This mess is so big,
and so deep,
and so tall,
we cannot clean it up.
There is no way at all
Granted, I'm not making any attempt to tune these bindings; other than max message size settings I'm using defaults. But note that all these bindings are blindingly fast in low latency environments. If nothing else this shows the danger of failing to take latency into account when testing your WCF services.

And that would seem like a good stopping point for this particular post. This exercise produced many more fascinating results which the margin of this blog is too small to contain, but that will have to wait for another day. Unless I change my mind, or forget.


Appendix - the Data

The industrious among you can reproduce the above graphs and many others using the following data.

Vista Data - Small Payload










Call/sec1ms50ms100ms200ms
NetTcp Buffered25818.049.624.8
BasicHttp Buffered1259.264.742.48
WS20071179.524.722.48
BasicHttps Buffered1119.344.672.48
NetTcp Streamed1118.584.442.46
BasicHttp Streamed1138.984.622.48
BasicHttps Streamed1038.984.722.36


Vista Data - Large Payload











Calls/sec1ms50ms100ms200ms
NetTcp Buffered81.8444440.91666670.5083333
WS20073.71.1472220.59722220.3111111
BasicHttp Buffered3.741.1444440.6055550.3194444
BasicHttps Buffered3.781.0888890.55555560.3083333
BasicHttp Streamed3.50.7750.42222220.2444444
BasicHttps Streamed3.40.82777780.447222220.24166667
NetTcp Streamed1.880.23611110.12222220.06944445


XP - Small payload









Calls/sec1ms50ms100ms200ms
NetTcp Buffered25018.133339.484.86
BasicHttp Buffered1319.7666664.82.5
WS20071209.54.882.46
BasicHttps Buffered1139.2333344.862.46
NetTcp Streamed1108.9333334.662.48
BasicHttp Streamed835.63.181.68
BasicHttps Streamed514.5333332.381.28


XP - Large Payload










Calls / sec1ms50ms100ms200ms
NetTcp Buffered7.162.0722221.1250.6222222
WS20073.381.3527780.75555560.3833333
BasicHttp Buffered3.91.3611110.74166670.4027778
BasicHttps Buffered3.421.0055560.53888890.2916667
BasicHttp Streamed3.180.64444450.32777780.1861111
BasicHttps Streamed3.420.59722220.31666670.1666667
NetTcp Streamed1.740.2333333330.127777780.06666667

No comments:

Post a Comment

Followers