Sunday, May 24, 2009

WCF Bug on W2K3 and XP: TCP session closes after every web service call when using HTTP transport with Streamed transfer mode

While profiling a WCF-based application recently I uncovered a bug in WCF, in which the WCF service host forcibly closes its TCP session with the client after each and every web service call. The client is helpless to prevent it, even if it keeps the client proxy open in the hope of reusing the session for subsequent calls. No exception happens when the client does make subsequent calls; instead the proxy silently negotiates a new TCP session and continues on its merry way.

A Microsoft representative that I talked to about this confirms that this is an issue in WCF services hosted on Windows 2003 and XP, if they use HTTP transport with transportMode "Streamed" or "StreamedRequest." The bug doesn't affect transportMode "StreamedResponse," and has been fixed on Vista, Windows 2008 and Windows 7.

This bug is potentially a big deal because, as I noted in my last post, initiating a new TCP session requires a synchronous round-trip to the server, which, thanks to this bug, becomes overhead added to each and every web service call. And if SSL is in use then at least one more synchronous round trip will be added to every call. These round trips are detrimental to performance, especially in high-latency environments.


Seeing the Bug in a Packet Trace

The bug is very easy to see in a network packet trace. For these traces I'm using Wireshark, though most any packet sniffer will do. See my last post for instructions on how to use Wireshark.


The Test Web Service

The service I'm using for this test is simply this:


WCF service that demonstrates the bug

You'll notice that this web service method does not have a Stream, Message or IXmlSerializable return type or parameter, which according to this MSDN article is needed to make use of WCF streaming. This post therefore covers what happens to the performance of other (non-streaming) web service methods that are part of a web service that has been configured to support streaming. An evaluation of how the bug affects web service methods that do stream will be the topic of another post, unless, to paraphrase Shrek, I change my mind, or forget.


The Test Server

For the server I used a very unremarkable Console app as the WCF self-hosting service:


Console app used as the WCF service host

With the following basicHttpBinding endpoint:

WCF service host's App.config endpoint configuration


The Test Client

The client just calls the web service twice:

WCF client code used for testing

with Streaming enabled:

WCF client binding specified in its App.config


The Bug Exposed

The bug can be seen by comparing the network traffic produced by the calls shown above using an XP server, with the traffic produced by a Vista server:

Packet trace showing multiple web service calls using the same client proxy, using Vista as the server - Click on the image to view it more clearly

Packet trace showing multiple web service calls using the same client proxy, using XP as the server  - Click on the image to view it more clearly

These packet traces show that when XP is used as the WCF service host, the TCP connection is closed and re-opened between web service calls. This is verified by the fact that on XP each call is made using a different client port, while on Vista the same port is used for each call.


Evaluating the effect of the Bug

In order to evaluate the effect that this bug has on WCF performance, I ran a set of tests, the results of which I'll describe below. The first test was simply to run the simple WCF service shown above using different binding configurations in a loop against XP and Vista, and to note the performance of each binding.


The Test Bindings

The server bindings I used for this initial test are these:

Server bindings tested for performance

I tested custom bindings designed to simulate basicHttpBindings just to verify that the issue affects any http-based transport, and not just basicHttp, but I found that the timings for those bindings were identical to the timings for the basicHttpBindings. From here on out I'm dropping them from these results, to keep the clutter down.


The Test Client

For the client I used a simple console app that calls the web service using each of the client bindings in a loop for 10 seconds, and counts the number of web service calls it's able to make:



Client code used to test the performance of various WCF bindings


The Bug on Low Latency Networks

Here are the results:

Web service calls per second on low latency network, using a small web service payload

This test was conducted on my home network using two computers, an XP machine and a Vista machine. They're connected by an 8011.n wireless network that has about 1 ms round-trip latency.

The first thing that stands out about the results is the remarkable performance of buffered NetTcp, which was able to move twice as many calls per second as the other bindings. The next thing that stands out is the fact that while all the other bindings perform about the same on Vista, on XP the basicHttp-streamed binding is about 25% slower than the others, and basicHttps-streamed is about 50% slower than the others. So even in this low latency test, the overhead of setting up new TCP and SSL sessions for every call has a measurable impact on performance.

Finally, even though the graphs above represent two different machines, the fact that the number of calls each machine is able to process per second is approximately the same gives me some confidence that I'll be able to compare absolute call times between them in the analyses that follow.


The Bug on High Latency Networks

The bug has a measurable effect even in low-latency networks, but the performance numbers shown above are still pretty good. An end-user is not likely to notice the difference between an application that's making 50 web service calls per second v. one that's making 250 calls per second. What is of more interest is how the bug behaves in high-latency networks. 

To simulate different network latency times, I used the WANEmulator, a marvelous tool that literally takes minutes to set up, and which makes an excellent Christmas gift, especially since it's free. Running the same tests with 50ms, 100ms and 200ms round-trip times produced the following results:


Seconds per basicHttp web service call on Windows XP


Seconds per basicHttp web service call on Vista

The results are not surprising. With 200ms round-trip times, streaming added 200ms to the average web service call time, due to the need to make an extra round trip to establish the TCP session. And SSL adds another 200ms to the call for establishing the SSL session. Though it's not readily visible in the packet traces I've got right now, in other traces I have seen that WCF is able to establish SSL sessions in a single round-trip, so again the result above is not surprising.

It's good to see in the results above that on Vista the problem is really fixed, that all four bindings take the same amount of time, and that the amount of time time they take is the faster of the times taken on XP (e.g. 0.4 seconds per call at 200ms).

The bad news in the results above is that latency has a dramatic impact on even the best performance numbers above - from lightning-fast at 1ms latency to 0.4 seconds per call at 200 ms latency. The reason for this is not entirely obvious. Remember that at in the tests above, we are averaging the call time between many calls, and (except for streamed bindings on XP where the bug is restarting the TCP session every time) we are reusing the TCP and the SSL sessions between calls, so the time needed to establish those sessions should hardly be visible in these results. Since my test web service is a two-way call (it doesn't have the IsOneWay operation contract attribute), each call will result in at least one synchronous round trip, but the source of any additional overhead is not obvious.

The further bad news is that when the bug is present, web service calls are again 25% slower in the streaming case, and 50% slower in the streaming+ssl case, than they should be.

Another interesting result appears if I add back the other bindings to the chart:


Seconds per call of various bindings on XP, with various network latency figures

While WS2007 and streamed  NetTcp have nearly identical results as buffered basicHttp, buffered NetTcp again stands out as being twice as fast as all the rest. In fact, its call times are almost exactly equal the round-trip time itself, which is the theoretical minimum.


The Bug with Large Web Service Payloads

Again, I'm not going to analyze the bug when streaming-based parameters are in use, for the very good reason that I don't feel like it right now, but even an unstreamed web service method can include a larger payload than the "hi mom" text I used above. To see the effect of the bug on those types of methods, I changed the client to send a gob of text that's about 277KB in length. To prevent anyone from compressing it, I serialized something that's already been compressed, a JPEG of my daughter:

My daughter Kaitlin

Like so:




Client program that sends a picture of Kaitlin to the server in a loop, using different bindings, and times the result

Naturally I had to increase some of the max size parameters in the client and server bindings to allow for larger messages:



Server bindings that allow for larger message sizes

Client bindings that allow for larger message sizes

In the client bindings above I collapsed some of the duplicate settings to save space.


The Bug on a Low Latency Network with a Large Web Service Payload 

The results are as follows:

Web service calls per second on low latency network, using a large web service payload

The graph has the same general shape as it did with a small payload, though the absolute numbers are much different - 7 or 8 calls per second at best, compared to 250 calls per second with the small payload. With a payload this large network bandwidth is no doubt affecting these numbers, but since bandwidth is fixed throughout these tests, they should work fine for determining the relative effect of latency on WCF.

Again buffered NetTcp is the runaway performance winner, but this time streamed basicHttp performs about as well as the rest, and we have a new loser: streamed NetTcp.

So the bug is not measurable with large payloads in low latency environments. This makes good sense, because each web service call is taking so much more time just to transfer its payload that the TCP and SSL session initiation time is small by comparison.


The Bug on High Latency Networks with Large Web Service Payloads

Logically, in high-latency environments the TCP and SSL session initiation time should continue to be overwhelmed by the sheer amount of time taken to deliver the payload, but let's see what the testing shows:


Seconds per basicHttp web service call on Windows XP with a large web service payload


Seconds per basicHttp web service call on Vista, with a large web service payload

To our shock and horror, we see that the bug appears to affect performance in the same proportion as in the small payload case, only now instead of increasing call times from 0.4 seconds per call to 0.8 seconds per call, it's increasing call times from 2 seconds to 6. 

The good news (if anything about the above charts can be called "good") is that the bug remains more or less fixed on Vista, where call times are uniformly close to the best call times on XP, roughly 3 to 4 seconds per call. The Vista results appear to show that there is some additional overhead involved in streaming large payloads however, which amounts to about an additional 1 second per call in the 200ms latency case.

But even if we assume that, apart from the bug, there's 1 sec. worth of "normal" overhead involved in streaming large payloads in the 200ms latency case, that still leaves 1-2 seconds of overhead that we can only attribute to the bug. It would appear that closing and re-opening the TCP session carries some additional penalty beyond the round-trip TCP session establishment time. Something called "slow start" TCP session tuning may be the culprit (sure sounds like it, don't ya think?), but to date I haven't proven it, mainly due to laziness, since I have the packet traces in which must lie the smoking gun.


Performance of Other Bindings with Large Payloads

Though it has nothing to do with the stated purpose of this post, it's fun to add back the other binding types to the chart above:


Seconds per call of various bindings on Vista, with various network latency figures, using a large web service method payload

While WS2007 performs about as well as the best of the basicHttp methods, buffered NetTcp is again the speed king, coming in at a good 1/3 faster than the best of the rest. 

But what can we say about streamed NetTcp? It's taking 15 seconds per call in the 200 ms latency case, a good 4 times slower than all the rest! Only Dr. Seuss seems adequate: 
This mess is so big,
and so deep,
and so tall,
we cannot clean it up.
There is no way at all
Granted, I'm not making any attempt to tune these bindings; other than max message size settings I'm using defaults. But note that all these bindings are blindingly fast in low latency environments. If nothing else this shows the danger of failing to take latency into account when testing your WCF services.

And that would seem like a good stopping point for this particular post. This exercise produced many more fascinating results which the margin of this blog is too small to contain, but that will have to wait for another day. Unless I change my mind, or forget.


Appendix - the Data

The industrious among you can reproduce the above graphs and many others using the following data.

Vista Data - Small Payload










Call/sec1ms50ms100ms200ms
NetTcp Buffered25818.049.624.8
BasicHttp Buffered1259.264.742.48
WS20071179.524.722.48
BasicHttps Buffered1119.344.672.48
NetTcp Streamed1118.584.442.46
BasicHttp Streamed1138.984.622.48
BasicHttps Streamed1038.984.722.36


Vista Data - Large Payload











Calls/sec1ms50ms100ms200ms
NetTcp Buffered81.8444440.91666670.5083333
WS20073.71.1472220.59722220.3111111
BasicHttp Buffered3.741.1444440.6055550.3194444
BasicHttps Buffered3.781.0888890.55555560.3083333
BasicHttp Streamed3.50.7750.42222220.2444444
BasicHttps Streamed3.40.82777780.447222220.24166667
NetTcp Streamed1.880.23611110.12222220.06944445


XP - Small payload









Calls/sec1ms50ms100ms200ms
NetTcp Buffered25018.133339.484.86
BasicHttp Buffered1319.7666664.82.5
WS20071209.54.882.46
BasicHttps Buffered1139.2333344.862.46
NetTcp Streamed1108.9333334.662.48
BasicHttp Streamed835.63.181.68
BasicHttps Streamed514.5333332.381.28


XP - Large Payload










Calls / sec1ms50ms100ms200ms
NetTcp Buffered7.162.0722221.1250.6222222
WS20073.381.3527780.75555560.3833333
BasicHttp Buffered3.91.3611110.74166670.4027778
BasicHttps Buffered3.421.0055560.53888890.2916667
BasicHttp Streamed3.180.64444450.32777780.1861111
BasicHttps Streamed3.420.59722220.31666670.1666667
NetTcp Streamed1.740.2333333330.127777780.06666667

Sunday, May 10, 2009

SSL Handshaking Overview

The goal of this post is to review what happens at the TCP/IP level when an SSL connection is initiated, paying particular attention to the parts of the handshake protocol that are synchronous, i.e., that result in the client or the server waiting for a response from the other before proceeding. Most of this post comes from the SSL 3.0 Specification listed on this web site and the TLS 1.0 Specification listed here.

Initiating a TCP Session

When a client such as a web browser first contacts a web site via SSL, the first thing that happens is a new TCP/IP session is initiated between the selected client port and the port handling SSL connections on the server, usually port 443:




This is a diagram of the standard 3-way handshake that initiates a TCP connection.

The TCP protocol provides reliable packet delivery by embedding a sequence number in every packet. Whenever an endpoint receives a TCP packet it responds with an acknowledgment (abbreviated as an “ACK”) indicating the highest sequence number it has successfully received. If the sender doesn’t receive and ACK for a packet it’s sent within a certain time period, it assumes the packet was lost, and re-sends it.

When an HTTPS request is sent to a server, the first thing that happens is that a new TCP/IP connection is established with the port the server uses for SSL connections, usually port 443.

The client issues a synchronization request (abbreviated as “SYN”), in which it indicates the client port number for the conversation, and with the initial sequence number for packets originating from the client.

The server responds by ACKing the client’s request, and with a SYN of its own, which contains its initial sequence number for packets originating from the server.

In the diagram, packets originating from the client are shown in blue, and packets originating from the server are in green. Dark-colored lines indicate synchronous conversations. E.g., the client sends its initial SYN and waits for the server to respond. The server responds, and the synchronous part of the conversation is completed. The client ACKs the server’s response, and then continues immediately with the next conversation, which is the client/server hello:

The Client and Server Hellos




In this diagram the client begins by sending the SSL Client Hello message, which contains, among other things, the highest version of SSL supported by the client, the list of encryption algorithms supported by the client, and a random number that will be used as a seed value for generating the SSL master secret encryption key.

This is a good time to note that SSL is based on both PKI encryption (public-key infrastructure) and on symmetric encryption. PKI encryption is based on a pair of keys, a public key and a private key. The properties of PKI encryption that make it so useful are these:
  • Anything encrypted with the public key can only be decrypted with the private key.
  • Anything encrypted with the private key can only be decrypted with the public key.
The server can therefore broadcast its public key to the world, and the client can use it to encrypt its messages without fear that anyone other than the holder of the private key can decrypt it. But PKI encryption is slow, so SSL uses it only to encrypt a seed value that is used to generate a symmetric key, which is used by the client and server to encrypt the rest of the conversation. Symmetric key encryption algorithms use a single key for both encryption and decryption. These algorithms are faster than PKI, but presuppose that the participants in the conversation have a secure way to exchange the encryption key.

In SSL, the conversation’s symmetric key is called the “master secret.” The key itself is not sent over the wire. Instead, three seed values are sent across the wire that the client and server can use to generate the actual symmetric key. The seed values include a random number generated on the client and sent to the server in the clear (unencrypted), a random number generated on the server and sent to the client in the clear, and another random number generated on the client and sent to the server encrypted via PKI.

Once the client sends the Client Hello it must wait for the server to respond with a set of messages that begin with a Server Hello message and end with a Server Hello Done. The messages sent by the server in between the Server Hello and the Server Hello Done depend on whether the server authenticates itself using digital certificates, and whether it requires clients to authenticate themselves with digitial certificates.

The diagram shows the most common case, in which the server authenticates itself to the client with a digital certificate, but does not require the client to authenticate itself.

In that scenario, the server selects the strongest encryption algorithm supported by both client and server, and returns its selection in the Server Hello, along with a random number generated on the server for use in creating the symmetric encryption key. It sends its digital certificate in a Server Certificate message, and then sends a Server Hello Done message.

As with the previous diagram, the synchronous part of the conversation uses dark colors. The ACKs are in lighter colors and are not synchronous. Which is to say, the client sends the Client Hello, and then waits for the stream of messages ending with the Server Hello Done. Neither side waits for the ACKs.

The client is now ready to send the information the server will need to generate the symmetric encryption key, and to instruct the server to switch to the chosen symmetric encryption algorithm:

The Client and Server Key Exchange, Change Cipher Specs and Finished Messages



In this diagram the client generates the information necessary for the client and server to generate the “master secret,” which is the symmetric encryption key. The exact information sent differs based on the encryption algorithm used. The client sends this in an SSL Client Key Exchange message.

The client then sends a Change Cipher Specs message, which is just a flag message that contains no data payload, and then a Finished message, which is the first message encrypted using the symmetric key. The content of the message is the entire handshake conversation up to the Change Cipher Specs message, which the recipient will decode to verify that encryption is working properly.

The server must wait to receive the client key information before proceeding with its part of the conversation, but the client doesn’t have to wait for the server to do so before it starts to send encrypted content under the session, so the synchronous part of the SSL setup has effectively concluded.

The server does reciprocate with its own Change Cipher Specs and Finished message however, which again contains an encrypted copy of the entire handshaking exchange between itself and the client. The client is also required to decrypt the Finished message payload and verify that encryption is working properly.

At this point the SSL connection, and the client and server may exchange encrypted content.


Viewing the Packets in Wireshark

All this becomes more concrete when the packets are viewed in a packet sniffer such as Wireshark. Wireshark is very easy to use. Just download, install and run the application, and select Capture\Interfaces... in order to instruct Wireshark to start capturing packets on one of your network devices. If several devices are listed and you're not certain which to capture, just leave the dialog open for moment while you browse the Internet. The number in the "Packets" column in the dialog will increase on the network device being used by the browser.

Next, find a web site that uses HTTPS, and load a web page from that site in your browser. You'll want to filter out all the network traffic except packets that are going to or coming from that site. You should be able to figure out the IP address of the site you're connecting to by viewing the packet stream in Wireshark as you refresh the browser. In my case I'm using a site with IP address 72.21.210.250. Once you've figured that out, add a filter to Wireshark Filter text box of this form: ip.dst == 72.21.210.250 or ip.src == 72.21.210.250.

To get a clear picture of an entire SSL session, its best to wait a few minutes until the server you've connected to terminates your TCP connection. You'll see that happen in Wireshark when the server sends a TCP "FIN" message to your machine, which Wireshark highlights in dark gray by default. Then select Capture\Restart in Wireshark, and refresh the HTTPS page in your browser. You'll see something like this:


Initiating the TCP Session

The first message packet is the client's TCP SYN, requesting a new session containing the client port, 51805, and the starting packet sequence number, zero. The server ACKs the client's SYN and responds with a SYN of its own in the next message:



Wireshark helpfully colors the two SYN packets in dark gray to highlight the fact that they represent the beginning of a new TCP session, and includes a "SEQ/ACK" analysis feature in its Packet Details pane that shows that the ACK in this packet is acknowledging receipt of the packet sent in frame 5 above.

At this point we have a valid TCP session. The client ACKs the receipt of the server SYN, and then immediately proceeds to the Client Hello:

Client and Server Hellos



Wireshark notes that this is a TLS 1.0 (Transport Layer Security) message by identifying the protocol as TLSv1. TLS 1.0 is basically a newer variant of SSL 3.0. The aspects of the handshaking protocols that I'm reviewing in this blog post used by the two protocols are the same, with one exception that I will note below.

The server responds with two packets. The first is this:



Wireshark flags this as a "TCP segment of a reassembled PDU" (Protocol Data Unit). This is just a way of saying that a single higher-level protocol's "data unit" (a TLSv1 record in this case) has been broken up into two TCP packets for transmission.

Expanding the Transmission Control Protocol node in the Packet Details pane, we can also see that this packet contains an ACK for the packet the client last sent.

Moving on to frame 10 we can see that the server has packaged three SSL messages into a single data unit: a Server Hello, the Certificate, and the Server Hello Done:




In the Packet Details pane, Wireshark helpfully tells us that this single TLSv1 data unit has been reassembled from two TCP packets, which are shown in frames 9 and 10. Looking at the Secure Socket Layer info in the packet, we see the three messages. If we then expand the Server Hello details, we can see that the server selected the RSA PKI algorithm and the RC4 symmetric encryption algorithm using a 128-bit key, and the MD5 algorithm for hashing:



Client and Server Key Exchanges

In frame 11 the client ACKs the server hello messages, and in frame 12 it immediately proceeds to the Client Key Exchange, Change Cipher Specs, and Finished message (which Wireshark calls an "Encrypted Handshake Method"), all of which are contained in a single TCP packet:



Frame 13 is a server ACK, and in frame 14 we get the server's Change Cipher Specs and Finished messages:



And in Frame 15 we see the start of the actual SSL-encrypted content of the web page.

Another Synchronous Handshake in TLS 1.0

One interesting thing to note about these Wireshark results is that, even though the client is supposed to be able to begin sending data as soon as it completes its client Finished message, in the results above it appears to wait until the server completes its server Finished message.

This appears to be the only difference that we have encountered in this post between the SSL 3.0 spec and the TLS 1.0 spec. The SSL 3.0 spec contains the phrase "No acknowledgment of the finished message is required" in its description of the client Finished message. That phrase has been removed from the TLS 1.0 spec, which is otherwise in many places a word-for-word copy of the SSL 3.0 spec. Also the flow diagram used by TLS 1.0 has been changed to make the client and server key exchange conversations synchronous.

Therefore it would appear that for TLS 1.0 the key exchange diagram should look like this:



to reflect the fact that the key exchange messages have been made synchronous in the newer protocol.

Followers