A Microsoft representative that I talked to about this confirms that this is an issue in WCF services hosted on Windows 2003 and XP, if they use HTTP transport with transportMode "Streamed" or "StreamedRequest." The bug doesn't affect transportMode "StreamedResponse," and has been fixed on Vista, Windows 2008 and Windows 7.
This bug is potentially a big deal because, as I noted in my last post, initiating a new TCP session requires a synchronous round-trip to the server, which, thanks to this bug, becomes overhead added to each and every web service call. And if SSL is in use then at least one more synchronous round trip will be added to every call. These round trips are detrimental to performance, especially in high-latency environments.
Seeing the Bug in a Packet Trace
The bug is very easy to see in a network packet trace. For these traces I'm using Wireshark, though most any packet sniffer will do. See my last post for instructions on how to use Wireshark.
The Test Web Service
The service I'm using for this test is simply this:
WCF service that demonstrates the bug
You'll notice that this web service method does not have a Stream, Message or IXmlSerializable return type or parameter, which according to this MSDN article is needed to make use of WCF streaming. This post therefore covers what happens to the performance of other (non-streaming) web service methods that are part of a web service that has been configured to support streaming. An evaluation of how the bug affects web service methods that do stream will be the topic of another post, unless, to paraphrase Shrek, I change my mind, or forget.
The Test Server
For the server I used a very unremarkable Console app as the WCF self-hosting service:
Console app used as the WCF service host
With the following basicHttpBinding endpoint:
The Test Client
The client just calls the web service twice:
WCF client code used for testing
with Streaming enabled:
WCF client binding specified in its App.config
The Bug Exposed
The bug can be seen by comparing the network traffic produced by the calls shown above using an XP server, with the traffic produced by a Vista server:
Packet trace showing multiple web service calls using the same client proxy, using Vista as the server - Click on the image to view it more clearly
Packet trace showing multiple web service calls using the same client proxy, using XP as the server - Click on the image to view it more clearly
These packet traces show that when XP is used as the WCF service host, the TCP connection is closed and re-opened between web service calls. This is verified by the fact that on XP each call is made using a different client port, while on Vista the same port is used for each call.
Evaluating the effect of the Bug
In order to evaluate the effect that this bug has on WCF performance, I ran a set of tests, the results of which I'll describe below. The first test was simply to run the simple WCF service shown above using different binding configurations in a loop against XP and Vista, and to note the performance of each binding.
The Test Bindings
Server bindings tested for performance
I tested custom bindings designed to simulate basicHttpBindings just to verify that the issue affects any http-based transport, and not just basicHttp, but I found that the timings for those bindings were identical to the timings for the basicHttpBindings. From here on out I'm dropping them from these results, to keep the clutter down.
The Test Client
For the client I used a simple console app that calls the web service using each of the client bindings in a loop for 10 seconds, and counts the number of web service calls it's able to make:
Client code used to test the performance of various WCF bindings
The Bug on Low Latency Networks
Here are the results:
Web service calls per second on low latency network, using a small web service payload
This test was conducted on my home network using two computers, an XP machine and a Vista machine. They're connected by an 8011.n wireless network that has about 1 ms round-trip latency.
The first thing that stands out about the results is the remarkable performance of buffered NetTcp, which was able to move twice as many calls per second as the other bindings. The next thing that stands out is the fact that while all the other bindings perform about the same on Vista, on XP the basicHttp-streamed binding is about 25% slower than the others, and basicHttps-streamed is about 50% slower than the others. So even in this low latency test, the overhead of setting up new TCP and SSL sessions for every call has a measurable impact on performance.
Finally, even though the graphs above represent two different machines, the fact that the number of calls each machine is able to process per second is approximately the same gives me some confidence that I'll be able to compare absolute call times between them in the analyses that follow.
The Bug on High Latency Networks
The bug has a measurable effect even in low-latency networks, but the performance numbers shown above are still pretty good. An end-user is not likely to notice the difference between an application that's making 50 web service calls per second v. one that's making 250 calls per second. What is of more interest is how the bug behaves in high-latency networks.
To simulate different network latency times, I used the WANEmulator, a marvelous tool that literally takes minutes to set up, and which makes an excellent Christmas gift, especially since it's free. Running the same tests with 50ms, 100ms and 200ms round-trip times produced the following results:
Seconds per basicHttp web service call on Vista
The results are not surprising. With 200ms round-trip times, streaming added 200ms to the average web service call time, due to the need to make an extra round trip to establish the TCP session. And SSL adds another 200ms to the call for establishing the SSL session. Though it's not readily visible in the packet traces I've got right now, in other traces I have seen that WCF is able to establish SSL sessions in a single round-trip, so again the result above is not surprising.
It's good to see in the results above that on Vista the problem is really fixed, that all four bindings take the same amount of time, and that the amount of time time they take is the faster of the times taken on XP (e.g. 0.4 seconds per call at 200ms).
The bad news in the results above is that latency has a dramatic impact on even the best performance numbers above - from lightning-fast at 1ms latency to 0.4 seconds per call at 200 ms latency. The reason for this is not entirely obvious. Remember that at in the tests above, we are averaging the call time between many calls, and (except for streamed bindings on XP where the bug is restarting the TCP session every time) we are reusing the TCP and the SSL sessions between calls, so the time needed to establish those sessions should hardly be visible in these results. Since my test web service is a two-way call (it doesn't have the IsOneWay operation contract attribute), each call will result in at least one synchronous round trip, but the source of any additional overhead is not obvious.
The further bad news is that when the bug is present, web service calls are again 25% slower in the streaming case, and 50% slower in the streaming+ssl case, than they should be.
Another interesting result appears if I add back the other bindings to the chart:
While WS2007 and streamed NetTcp have nearly identical results as buffered basicHttp, buffered NetTcp again stands out as being twice as fast as all the rest. In fact, its call times are almost exactly equal the round-trip time itself, which is the theoretical minimum.
The Bug with Large Web Service Payloads
Again, I'm not going to analyze the bug when streaming-based parameters are in use, for the very good reason that I don't feel like it right now, but even an unstreamed web service method can include a larger payload than the "hi mom" text I used above. To see the effect of the bug on those types of methods, I changed the client to send a gob of text that's about 277KB in length. To prevent anyone from compressing it, I serialized something that's already been compressed, a JPEG of my daughter:
Like so:
Client program that sends a picture of Kaitlin to the server in a loop, using different bindings, and times the result
Naturally I had to increase some of the max size parameters in the client and server bindings to allow for larger messages:
Server bindings that allow for larger message sizes
In the client bindings above I collapsed some of the duplicate settings to save space.
The Bug on a Low Latency Network with a Large Web Service Payload
The results are as follows:
The graph has the same general shape as it did with a small payload, though the absolute numbers are much different - 7 or 8 calls per second at best, compared to 250 calls per second with the small payload. With a payload this large network bandwidth is no doubt affecting these numbers, but since bandwidth is fixed throughout these tests, they should work fine for determining the relative effect of latency on WCF.
Again buffered NetTcp is the runaway performance winner, but this time streamed basicHttp performs about as well as the rest, and we have a new loser: streamed NetTcp.
So the bug is not measurable with large payloads in low latency environments. This makes good sense, because each web service call is taking so much more time just to transfer its payload that the TCP and SSL session initiation time is small by comparison.
The Bug on High Latency Networks with Large Web Service Payloads
Logically, in high-latency environments the TCP and SSL session initiation time should continue to be overwhelmed by the sheer amount of time taken to deliver the payload, but let's see what the testing shows:
Seconds per basicHttp web service call on Windows XP with a large web service payload
To our shock and horror, we see that the bug appears to affect performance in the same proportion as in the small payload case, only now instead of increasing call times from 0.4 seconds per call to 0.8 seconds per call, it's increasing call times from 2 seconds to 6.
The good news (if anything about the above charts can be called "good") is that the bug remains more or less fixed on Vista, where call times are uniformly close to the best call times on XP, roughly 3 to 4 seconds per call. The Vista results appear to show that there is some additional overhead involved in streaming large payloads however, which amounts to about an additional 1 second per call in the 200ms latency case.
But even if we assume that, apart from the bug, there's 1 sec. worth of "normal" overhead involved in streaming large payloads in the 200ms latency case, that still leaves 1-2 seconds of overhead that we can only attribute to the bug. It would appear that closing and re-opening the TCP session carries some additional penalty beyond the round-trip TCP session establishment time. Something called "slow start" TCP session tuning may be the culprit (sure sounds like it, don't ya think?), but to date I haven't proven it, mainly due to laziness, since I have the packet traces in which must lie the smoking gun.
Performance of Other Bindings with Large Payloads
Though it has nothing to do with the stated purpose of this post, it's fun to add back the other binding types to the chart above:
Seconds per call of various bindings on Vista, with various network latency figures, using a large web service method payload
While WS2007 performs about as well as the best of the basicHttp methods, buffered NetTcp is again the speed king, coming in at a good 1/3 faster than the best of the rest.
But what can we say about streamed NetTcp? It's taking 15 seconds per call in the 200 ms latency case, a good 4 times slower than all the rest! Only Dr. Seuss seems adequate:
This mess is so big,and so deep,and so tall,we cannot clean it up.There is no way at all
Granted, I'm not making any attempt to tune these bindings; other than max message size settings I'm using defaults. But note that all these bindings are blindingly fast in low latency environments. If nothing else this shows the danger of failing to take latency into account when testing your WCF services.
And that would seem like a good stopping point for this particular post. This exercise produced many more fascinating results which the margin of this blog is too small to contain, but that will have to wait for another day. Unless I change my mind, or forget.
Appendix - the Data
The industrious among you can reproduce the above graphs and many others using the following data.
Vista Data - Small Payload
Call/sec | 1ms | 50ms | 100ms | 200ms |
NetTcp Buffered | 258 | 18.04 | 9.62 | 4.8 |
BasicHttp Buffered | 125 | 9.26 | 4.74 | 2.48 |
WS2007 | 117 | 9.52 | 4.72 | 2.48 |
BasicHttps Buffered | 111 | 9.34 | 4.67 | 2.48 |
NetTcp Streamed | 111 | 8.58 | 4.44 | 2.46 |
BasicHttp Streamed | 113 | 8.98 | 4.62 | 2.48 |
BasicHttps Streamed | 103 | 8.98 | 4.72 | 2.36 |
Vista Data - Large Payload
Calls/sec | 1ms | 50ms | 100ms | 200ms |
NetTcp Buffered | 8 | 1.844444 | 0.9166667 | 0.5083333 |
WS2007 | 3.7 | 1.147222 | 0.5972222 | 0.3111111 |
BasicHttp Buffered | 3.74 | 1.144444 | 0.605555 | 0.3194444 |
BasicHttps Buffered | 3.78 | 1.088889 | 0.5555556 | 0.3083333 |
BasicHttp Streamed | 3.5 | 0.775 | 0.4222222 | 0.2444444 |
BasicHttps Streamed | 3.4 | 0.8277778 | 0.44722222 | 0.24166667 |
NetTcp Streamed | 1.88 | 0.2361111 | 0.1222222 | 0.06944445 |
XP - Small payload
Calls/sec | 1ms | 50ms | 100ms | 200ms |
NetTcp Buffered | 250 | 18.13333 | 9.48 | 4.86 |
BasicHttp Buffered | 131 | 9.766666 | 4.8 | 2.5 |
WS2007 | 120 | 9.5 | 4.88 | 2.46 |
BasicHttps Buffered | 113 | 9.233334 | 4.86 | 2.46 |
NetTcp Streamed | 110 | 8.933333 | 4.66 | 2.48 |
BasicHttp Streamed | 83 | 5.6 | 3.18 | 1.68 |
BasicHttps Streamed | 51 | 4.533333 | 2.38 | 1.28 |
XP - Large Payload
Calls / sec | 1ms | 50ms | 100ms | 200ms |
NetTcp Buffered | 7.16 | 2.072222 | 1.125 | 0.6222222 |
WS2007 | 3.38 | 1.352778 | 0.7555556 | 0.3833333 |
BasicHttp Buffered | 3.9 | 1.361111 | 0.7416667 | 0.4027778 |
BasicHttps Buffered | 3.42 | 1.005556 | 0.5388889 | 0.2916667 |
BasicHttp Streamed | 3.18 | 0.6444445 | 0.3277778 | 0.1861111 |
BasicHttps Streamed | 3.42 | 0.5972222 | 0.3166667 | 0.1666667 |
NetTcp Streamed | 1.74 | 0.233333333 | 0.12777778 | 0.06666667 |