Saturday, September 26, 2009

Hannibal the Cannibal on Class and Method Design

Everything I know about class and method design I learned from Hannibal Lecter.

OK that's not exactly true, but the first, most basic, and in many ways most important thing that I learned about class and method design I learned from the good doctor. Before there was IoC, dependency injection, MVC, MVP, the Open Close Principal, or Liskov's Substitution Principal. Before there was even encapsulation and loose coupling, there was Hannibal Lecter's First Principle:


As software developers we live in a world of unbounded complexity. Most of the design principles that we have developed over the years represent, at their heart, our effort to beat down and shackle that complexity: to build walls around complex systems and declare, "Beyond this point none shall pass!"(encapsulation), to clearly define responsibilities, declaring "A shall do thus, and B shall do thus, and never shall the one take up the task of the other" (abstraction, loose coupling, MVC, MVP). We keep these ideas ever in our minds as we develop the classes and methods that implement the solutions that earn us our daily bread.

But the advice we should turn to most often is that of Hannibal Lecter. His words should ring in our ears throughout each day: each thing that we develop, each method, each class -- what is it? Step back from the problem you are trying to solve for a moment, and look at each one of our little creations in isolation. Each is deserving of its own identity, its own purpose, its own place in the cosmos. Each deserves to have a life of its own, unique and independent of the lives of its brothers and sisters. Software is complex enough without having to remember the particular problem that the author of a class or method was working on at the time he or she wrote it. As Abraham Lincoln once said, "the world will little note nor long remember the context in which the method was written, but the method itself will live on."

Sometimes it's not easy to step back and focus on the individual pieces of our solution in isolation from the rest. It takes a change of gears, a certain adjustment of perspective. In the following clip we see Hannibal guide Clarice in finding the true essence of the class she's working on. As with many of the good doctor's creations, it may be best that we don't try to guess its exact purpose, though it does appear to be a subclass of a class called "Man":


As always, the good doctor gets straight at the heart of the matter. The purpose of this class is not to be found in the particular use to which it's currently being put. It has a more basic, more fundamental nature.

At this point it may be best not to try to draw concrete examples out of the problem Hannibal and Clarice are working on. Instead, I'll give three brief examples from some code I reviewed just last week. The names of the methods and the classes will of course be changed to protect the innocent: these types of problems disturb the good doctor greatly, and it's best not to upset him.

namespace HannibalsFirstPrinciple

{

    public class ViolationNumberOne

    {

        // Called before running the batch process

        public void Foo()

        {

            // Does some stuff

        }

    }

}


Any time you see a comment like the one on the public method "Foo" shown here, alarm bells should start ringing in your head. You don't have to read the method body to know that Hannibal's First Principle has been violated. This is a public method on a public class. Who's to say that Foo will always be called before running the batch process? The client can call Foo whenever it likes! And what the heck is "the batch process" anyway? It sounds like some piece of some particular problem the author of this class was working on at the time he or she created it. Clearly the author wrote this with a single-minded focus on one particular problem that he or she was working on, and never took the time to think about what this class is in and of itself.

This programmer would not be producing code of this sort today if he or she had studied under doctor Lecter. Here's another example:

namespace HannibalsFirstPrinciple

{

    public class Account

    {

        public int Amount { get; set; }

    }

    public class ViolationNumberTwo

    {

        private const int MAX_ACCOUNT_AMOUNT = 10000;

        public void ValidateAccounts()

        {

            var account = new Account() { Amount = 5000 };

            Console.WriteLine(IsAccountOverTheLimit(account.Amount, MAX_ACCOUNT_AMOUNT));

        }

        private bool IsAccountOverTheLimit(int value1, int value2)

        {

            return value1 > value2;

        }

    }

}


Consider the method IsAccountOverTheLimit. In the context of the entire program, it seems reasonable enough: it checks the amount in an account against the maximum value allowed. But what does it really do? In fact, IsAccountOverTheLimit itself has nothing to do with accounts or limits at all. It simply compares two integers; it's the client that gives these particular meanings to the method. Had the author studied under doctor Lecter (and survived), he or she would have written the method this way:

        private bool IsAccountOverTheLimit(Account account)

        {

            return account.Amount > MAX_ACCOUNT_AMOUNT;

        }


Or perhaps this way:

        public static bool IsGreaterThan(this int thisNumber, int numberToCompere)

        {

            return thisNumber > numberToCompere;

        }


Either make the method a full-fleshed citizen of the world by giving it all the rights and responsibilities implied by its name, or change its name so that it can declare its true purpose in life with honesty and dignity.

And the final example gets to the heart of understanding the essence of a class, and of maintaining its simplicity:

namespace HannibalsFirstPrinciple

{

    public class Account

    {

        private int _amount;

        private readonly bool _isInterestBearing;

        public Account() { }

        public Account(bool isInterestBearing)

        {

            _isInterestBearing = isInterestBearing;

        }

        public int Amount

        {

            get

            {

                return _isInterestBearing ? (int)(1.1 * _amount) : _amount;

            }

            set { _amount = value; }

        }

    }

}


The programmer here as cluttered up a perfectly good Account class with a lot of extra nonsense that many clients of the class probably don't care about. If you have a special use for a class, specialize it! Don't introduce complexity when simplicity will do:


namespace HannibalsFirstPrinciple

{

    public class Account

    {

        public int Amount { get; set; }

    }

    public class InterestBearingAccount : Account

    {

        private int _amount;

        public int Amount

        {

            get

            {

                return (int)(1.1 * _amount);

            }

            set { _amount = value; }

        }

    }

}


These examples seem silly and simplistic because they've been constructed to shine the light of day on these problems, but in the real world these anti-patterns can easily hide amid the complexity of our code, and it's important to note that I saw examples like these in real production code just this week. These types of problems can bite any of us unless we keep our vigilance, unless we keep doctor Lecter's words of wisdom ever ringing in our ears.

Or if doctor Lecter has used too many words for you to remember, then at least remember this exhortation by one of his students, who has clearly grasped the essence and urgency of his lesson:


Safe and happy coding all!


Tuesday, June 23, 2009

Effect on WCF web service performance of the Tcp window size in high-latency Windows networks

In my last post I looked at the effect of a bug in WCF on Windows XP and W2K3, in which the Tcp connection used by web services that use TransferMode "Streamed" or "StreamedRequest" is closed after every call. This forces subsequent calls to re-establish the Tcp and, if in use, SSL connection(s) at the start of every call, which can have serious performance consequences.


TCP Window Size and Performance on High-Latency Networks

As a follow-up, I'd like to look at the effect of adjusting the Tcp window size on performance. This and this do an excellent job of describing what the Tcp window is, how it affects Tcp traffic in high-latency, high-bandwidth environments, and how to set it. I won't try to duplicate all that here, except to quote as a brief overview two snippets from this MSDN article on it:
TCP window size

The TCP receive window size is the amount of receive data (in bytes) that can be buffered during a connection. The sending host can send only that amount of data before it must wait for an acknowledgment and window update from the receiving host. The Windows TCP/IP stack is designed to self-tune itself in most environments, and uses larger default window sizes than earlier versions.
....

Windows scaling

For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and is limited to 2 bytes, or a window size of 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. TCP window scale is an option used to increase the maximum window size from 65,535 bytes to 1 Gigabyte.

The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14.

To calculate the true window size, multiply the window size by 2^S where S is the scale value.
For Example:

If the window size is 65,535 bytes with a window scale factor of 3.
True window size = 65535*2^3
True window size = 524280
Increasing this setting in high-latency, high-bandwidth environments can help improve performance because it helps prevent the sender from delivering all the bytes in its buffer before the receiver has a chance to acknowledge receipt of the packets. When that happens the sender has to pause until the ACKs start coming in, resulting in a herky-jerky, slow connection.


What the Tcp Window Size Affects

One thing I didn't understand about the Tcp Window Size setting until I conducted these tests is that it is a receive buffer. When configured on machine A, it tells any potential sending machine B the size of the buffer it should use when sending data to machine A. This was unintuitive to me. Also, machines A and B can use different Tcp window sizes during the same connection. Machine A uses the buffer size configured on machine B when machine A sends data to B, and machine B uses the buffer configured on machine A when machine B sends data to A.

Testing the Affect of Various Tcp Window Sizes

The following tests were conducted with a WCF service host running on Windows XP, with a Vista client. The Tcp window size can't be configured on Vista (or W2K8) the way it can on XP and W2K3; Microsoft has replaced it with Tcp Window Auto-tuning, which they believe renders the TcpWindowSize registry setting obsolete. We'll take a look at that as well. The tests were all conducted using 250ms latency simulated using the WANemulator, on a wireless 8011n network with approximately 30 Mbits of bandwidth.

The tests use essentially the same simple WCF client and host application that I used in my last post, except that I now have 3 web services instead of 1:
  • GetStringData: A simple web service that takes a string parameter and returns it in its return value. I run two tests with this service, one in which I transfer a small "Hi mom" string back and forth, and one in which I transfer a larger string, a 277KB JPEG of my daughter:
My daughter Kaitlin
  • GetStreamedData: A web service that gets a Stream from the server.
  • PutStreamedData: A web service that sends a Stream to the server.
For these tests I'm using GetStreamedData and PutStreamedData to transfer a 3.9 MB movie of a bear:


A bear

Both the picture of Kaitlin and the bear video are already compressed, so there's no worry about anyone else compressing them again and skewing the results.


The Test WCF Host Application

As before, the WCF host, which is running on the XP machine, is a simple console app:


Which hosts this service:


That uses the following, simple bindings:



The Test WCF Client Application

The client uses similar bindings:




And is also a simple console app:









Results of the Tests - XP

I exercised all of the web service calls against an XP machine that was configured to use Tcp Window scale modes of 0, 1, 2 and 3. A scale mode of "0" means Tcp1323Opts (the registry setting that enables Tcp Window sizes greater than 64K) is disabled, and the TcpWindowSize is the default value of roughly 64K. Scale mode 1 means TcpWindowSize is roughly 2^1 x 64K, scale mode 2 means TcpWindowSize is roughly 2^2 x 64K, etc.

The results for the PutStreamedData call are as follows:

Seconds per PutStreamedData web service call, by TcpWindowSize scale mode. Click on the image to open it in a larger window.

This graph shows the amount of time each PutStreamedData web service call takes, based on the TcpWindowSize scale mode configured on the XP machine.

The three streamed transfer mode bindings perform the worst, with streamed NetTcp the worst of all. This matches what I found in my last post when I tested large string-based payloads (I was too lazy in that post to test web services that use Stream parameters). Binding performances get significantly better with a TcpWindowSize scale mode of 1, and after that there's no improvement. The other binding types all start out at about the same performance level, except for buffered NetTcp, which is the best of all. But they separate from one another as the TcpWindowSize is increased, and ultimately buffered BasicHttp and WS2007 perform the best, buffered NetTcp comes in second, and buffered BasicHttp with SSL comes in as the worst of the rest, but still much better than the streamed bindings. This differs from what I saw in my last post, where buffered NetTcp was the runaway performance king in all tests.

Remember that the graph above represents streamed data moving from a Vista machine to an XP machine. The TcpWindowSize has been configured as a registry setting on the XP machine. During the Tcp handshake, it has been passed from the XP machine to the Vista machine, which uses it to determine how many bytes it can send at a time, before receiving ACKs from the XP machine.


Results of the Test - Vista

Now let's look at how data that's sent from XP to Vista looks:

Seconds per GetStreamedData web service call, by TcpWindowSize scale mode on the XP machine

Here we see the bindings sorting themselves into two groups. The results are much more variable when sending data from XP to Vista than they were in the other direction, but doesn't appear to vary based on the TcpWindowSize that was configured on the XP machine. This validates the idea that Vista is relying on Tcp Window Auto-tuning, and not the TcpWindowSize configured on the host machine, to optimize the connection.


The Effect of Increased Tcp Window Sizes in a Packet Trace

The fact that the Vista machine and XP machine are using different Tcp window sizes can easily be seen in this Wireshark packet trace, taken before any TcpWindowSize registry setting was made:

Packet trace of a GetStreamedData call showing the different Tcp window sizes requested by Vista and XP


Packet trace of a GetStreamedData call showing the window size adjusting upward as the call proceeds



Packet trace of a PutStreamedData call showing the default 64KB window size being used when moving data from Vista to XP

The alert reader may notice that the Tcp session didn't end between the two web service calls. That's because for this test I changed the client code to call both web service calls on the same connection, without closing the connection between calls.

In my testing the Tcp window requested by Vista always had a scale mode of 2. This MSDN blog post says that Vista determines the scale mode based on the bandwidth delay product (bandwidth times latency). Since my tests were all conducted with the same bandwidth and latency, it's not surprising that Vista chose the same scale mode throughout.

The effect of the smaller Tcp window size used by default on XP can be seen by running a few queries in Wireshark. The total number of bytes transferred by the GetStreamedData and PutStreamedData calls are nearly identical, 5394699 v. 5394714. They are, after all, transmitting the same file across the network. But if you set the Wireshark filter to search for packets that arrived 0.2 seconds or more after the previous packet (frame.time_delta > 0.2), you see that only 3 such instances occurred during the GetStreamedData call (the one that used the larger Tcp window), while 20 occurred during the PutStreamedData call. Here's one of them:


Packet trace showing communication pausing due to a full buffer

These quarter-second pauses (equal to our round trip latency time) represent moments when the sender has completely filled its buffer and must wait for an ACK before continuing. And the discrepancy in frame time deltas is even more widespread than these 20 incidents: there are 48 deltas greater than 0.1 in the PutStreamedData communication to only 35 for GetStreamedData; 132 greater than 0.05 for PutStreamedData to 51 for GetStreamedData; 421 greater than 0.01 for PutStreamedData to 89 for GetStreamedData, and 679 greater than 0.001 for PutStreamedData to 430 for GetStreamedData. These pauses indicate delays in the communication caused by the sender's buffer momentarily filling up.


Summary of Findings on Stream Parameter Types

So to list some take-aways from the two graphs above:

Based on the PutStreamedData graph (Vista to XP data transfer)
  • It would appear that WCF on XP has particularly poor performance when streamed transfer modes are used. The buffered transfer modes out-perform their streamed counterparts by 50%-75%
  • For buffered BasicHttp, SSL doesn't add any appreciable overhead in the default case, but when the Tcp window size is optimally configured buffered BasicHttp is 50% faster than buffered BasicHttp with SSL
  • Under the default configuration, buffered NetTcp performs best, but when optimally configured buffered BasicHttp and WS2007 perform about 40% better than buffered NetTcp.
Based on the GetStreamedData graph (XP to Vista data transfer)
  • WCF on Vista appears to have particularly poor performance when SSL is used. The two bindings that use SSL perform worse than their non-SSL counterparts by 70%-75%. This is for me the most surprising finding of this entire exercise, because there was no indication that this would be the case in the large string-based payload tests that I conducted in my last post.
  • Streamed NetTcp is the next worst-performing binding, about 50% worse than buffered NetTcp. No surprise there as that binding has consistently performed terribily in all the testing I've done.
  • The other bindings perform roughly the same, about 50% better than the three worst-performing ones.

Mysteriously Poor Performance of SSL on Vista

As to why SSL-based bindings perform so poorly on Vista I'm at a loss to say, other than to note that I see similar-looking communications pauses in the SSL-based Wireshark traces. In one example, comparing a GetStreamedData call made by buffered BasicHttp and buffered BasicHttp with SSL:
  • 6554 packets transfered in the non-SSL case, 6704 for SSL
  • Pauses greater than 0.01 seconds: 194 in the non-SSL case, 225 for SSL
  • Pauses greater than 0.1 seconds: 17 in the non-SSL case, 157 for SSL
  • Pauses greater than 0.2 seconds: 5 in the non-SSL case, 38 for SSL
This certainly bears further investigation in a following post, which I'll get to unles I get distracted by a bright, shiny object.


Findings for String Parameter Types

The performance chart for the large string-based payload test is as follows:

Seconds per web service call, large string-based payload, by Tcp window scale mode

With the exception of streamed NetTcp, there's a slight performance gain seen in all of the bindings, ranging from about 15% - 25%, when larger Tcp window sizes are used. Overall, the results are the same as the findings of my last post: streamed netTcp is the worst-performing binding, followed by the two streamed BasicHttp bindings, followed by buffered BasicHttp with SSL, followed by the rest, with buffered NetTcp the best performing of the bunch.

And the small string-based web service payload test gives similar results to the ones in my last post: streamed BasicHttp bindings perform worst due to the WCF bug that I talked about in my last post, buffered NetTcp performs best, and the rest are in the middle:


Seconds per web service call, small string-based payload, by Tcp window scale mode

In these tests the TcpWindow has no detectable effect on performance at all.


Conclusion

If any words of wisdom come out of these last two posts, it's that all bindings are not alike, and if you want your WCF application to perform well, you had better give careful attention to the bindings you use in your application.

Sunday, May 24, 2009

WCF Bug on W2K3 and XP: TCP session closes after every web service call when using HTTP transport with Streamed transfer mode

While profiling a WCF-based application recently I uncovered a bug in WCF, in which the WCF service host forcibly closes its TCP session with the client after each and every web service call. The client is helpless to prevent it, even if it keeps the client proxy open in the hope of reusing the session for subsequent calls. No exception happens when the client does make subsequent calls; instead the proxy silently negotiates a new TCP session and continues on its merry way.

A Microsoft representative that I talked to about this confirms that this is an issue in WCF services hosted on Windows 2003 and XP, if they use HTTP transport with transportMode "Streamed" or "StreamedRequest." The bug doesn't affect transportMode "StreamedResponse," and has been fixed on Vista, Windows 2008 and Windows 7.

This bug is potentially a big deal because, as I noted in my last post, initiating a new TCP session requires a synchronous round-trip to the server, which, thanks to this bug, becomes overhead added to each and every web service call. And if SSL is in use then at least one more synchronous round trip will be added to every call. These round trips are detrimental to performance, especially in high-latency environments.


Seeing the Bug in a Packet Trace

The bug is very easy to see in a network packet trace. For these traces I'm using Wireshark, though most any packet sniffer will do. See my last post for instructions on how to use Wireshark.


The Test Web Service

The service I'm using for this test is simply this:


WCF service that demonstrates the bug

You'll notice that this web service method does not have a Stream, Message or IXmlSerializable return type or parameter, which according to this MSDN article is needed to make use of WCF streaming. This post therefore covers what happens to the performance of other (non-streaming) web service methods that are part of a web service that has been configured to support streaming. An evaluation of how the bug affects web service methods that do stream will be the topic of another post, unless, to paraphrase Shrek, I change my mind, or forget.


The Test Server

For the server I used a very unremarkable Console app as the WCF self-hosting service:


Console app used as the WCF service host

With the following basicHttpBinding endpoint:

WCF service host's App.config endpoint configuration


The Test Client

The client just calls the web service twice:

WCF client code used for testing

with Streaming enabled:

WCF client binding specified in its App.config


The Bug Exposed

The bug can be seen by comparing the network traffic produced by the calls shown above using an XP server, with the traffic produced by a Vista server:

Packet trace showing multiple web service calls using the same client proxy, using Vista as the server - Click on the image to view it more clearly

Packet trace showing multiple web service calls using the same client proxy, using XP as the server  - Click on the image to view it more clearly

These packet traces show that when XP is used as the WCF service host, the TCP connection is closed and re-opened between web service calls. This is verified by the fact that on XP each call is made using a different client port, while on Vista the same port is used for each call.


Evaluating the effect of the Bug

In order to evaluate the effect that this bug has on WCF performance, I ran a set of tests, the results of which I'll describe below. The first test was simply to run the simple WCF service shown above using different binding configurations in a loop against XP and Vista, and to note the performance of each binding.


The Test Bindings

The server bindings I used for this initial test are these:

Server bindings tested for performance

I tested custom bindings designed to simulate basicHttpBindings just to verify that the issue affects any http-based transport, and not just basicHttp, but I found that the timings for those bindings were identical to the timings for the basicHttpBindings. From here on out I'm dropping them from these results, to keep the clutter down.


The Test Client

For the client I used a simple console app that calls the web service using each of the client bindings in a loop for 10 seconds, and counts the number of web service calls it's able to make:



Client code used to test the performance of various WCF bindings


The Bug on Low Latency Networks

Here are the results:

Web service calls per second on low latency network, using a small web service payload

This test was conducted on my home network using two computers, an XP machine and a Vista machine. They're connected by an 8011.n wireless network that has about 1 ms round-trip latency.

The first thing that stands out about the results is the remarkable performance of buffered NetTcp, which was able to move twice as many calls per second as the other bindings. The next thing that stands out is the fact that while all the other bindings perform about the same on Vista, on XP the basicHttp-streamed binding is about 25% slower than the others, and basicHttps-streamed is about 50% slower than the others. So even in this low latency test, the overhead of setting up new TCP and SSL sessions for every call has a measurable impact on performance.

Finally, even though the graphs above represent two different machines, the fact that the number of calls each machine is able to process per second is approximately the same gives me some confidence that I'll be able to compare absolute call times between them in the analyses that follow.


The Bug on High Latency Networks

The bug has a measurable effect even in low-latency networks, but the performance numbers shown above are still pretty good. An end-user is not likely to notice the difference between an application that's making 50 web service calls per second v. one that's making 250 calls per second. What is of more interest is how the bug behaves in high-latency networks. 

To simulate different network latency times, I used the WANEmulator, a marvelous tool that literally takes minutes to set up, and which makes an excellent Christmas gift, especially since it's free. Running the same tests with 50ms, 100ms and 200ms round-trip times produced the following results:


Seconds per basicHttp web service call on Windows XP


Seconds per basicHttp web service call on Vista

The results are not surprising. With 200ms round-trip times, streaming added 200ms to the average web service call time, due to the need to make an extra round trip to establish the TCP session. And SSL adds another 200ms to the call for establishing the SSL session. Though it's not readily visible in the packet traces I've got right now, in other traces I have seen that WCF is able to establish SSL sessions in a single round-trip, so again the result above is not surprising.

It's good to see in the results above that on Vista the problem is really fixed, that all four bindings take the same amount of time, and that the amount of time time they take is the faster of the times taken on XP (e.g. 0.4 seconds per call at 200ms).

The bad news in the results above is that latency has a dramatic impact on even the best performance numbers above - from lightning-fast at 1ms latency to 0.4 seconds per call at 200 ms latency. The reason for this is not entirely obvious. Remember that at in the tests above, we are averaging the call time between many calls, and (except for streamed bindings on XP where the bug is restarting the TCP session every time) we are reusing the TCP and the SSL sessions between calls, so the time needed to establish those sessions should hardly be visible in these results. Since my test web service is a two-way call (it doesn't have the IsOneWay operation contract attribute), each call will result in at least one synchronous round trip, but the source of any additional overhead is not obvious.

The further bad news is that when the bug is present, web service calls are again 25% slower in the streaming case, and 50% slower in the streaming+ssl case, than they should be.

Another interesting result appears if I add back the other bindings to the chart:


Seconds per call of various bindings on XP, with various network latency figures

While WS2007 and streamed  NetTcp have nearly identical results as buffered basicHttp, buffered NetTcp again stands out as being twice as fast as all the rest. In fact, its call times are almost exactly equal the round-trip time itself, which is the theoretical minimum.


The Bug with Large Web Service Payloads

Again, I'm not going to analyze the bug when streaming-based parameters are in use, for the very good reason that I don't feel like it right now, but even an unstreamed web service method can include a larger payload than the "hi mom" text I used above. To see the effect of the bug on those types of methods, I changed the client to send a gob of text that's about 277KB in length. To prevent anyone from compressing it, I serialized something that's already been compressed, a JPEG of my daughter:

My daughter Kaitlin

Like so:




Client program that sends a picture of Kaitlin to the server in a loop, using different bindings, and times the result

Naturally I had to increase some of the max size parameters in the client and server bindings to allow for larger messages:



Server bindings that allow for larger message sizes

Client bindings that allow for larger message sizes

In the client bindings above I collapsed some of the duplicate settings to save space.


The Bug on a Low Latency Network with a Large Web Service Payload 

The results are as follows:

Web service calls per second on low latency network, using a large web service payload

The graph has the same general shape as it did with a small payload, though the absolute numbers are much different - 7 or 8 calls per second at best, compared to 250 calls per second with the small payload. With a payload this large network bandwidth is no doubt affecting these numbers, but since bandwidth is fixed throughout these tests, they should work fine for determining the relative effect of latency on WCF.

Again buffered NetTcp is the runaway performance winner, but this time streamed basicHttp performs about as well as the rest, and we have a new loser: streamed NetTcp.

So the bug is not measurable with large payloads in low latency environments. This makes good sense, because each web service call is taking so much more time just to transfer its payload that the TCP and SSL session initiation time is small by comparison.


The Bug on High Latency Networks with Large Web Service Payloads

Logically, in high-latency environments the TCP and SSL session initiation time should continue to be overwhelmed by the sheer amount of time taken to deliver the payload, but let's see what the testing shows:


Seconds per basicHttp web service call on Windows XP with a large web service payload


Seconds per basicHttp web service call on Vista, with a large web service payload

To our shock and horror, we see that the bug appears to affect performance in the same proportion as in the small payload case, only now instead of increasing call times from 0.4 seconds per call to 0.8 seconds per call, it's increasing call times from 2 seconds to 6. 

The good news (if anything about the above charts can be called "good") is that the bug remains more or less fixed on Vista, where call times are uniformly close to the best call times on XP, roughly 3 to 4 seconds per call. The Vista results appear to show that there is some additional overhead involved in streaming large payloads however, which amounts to about an additional 1 second per call in the 200ms latency case.

But even if we assume that, apart from the bug, there's 1 sec. worth of "normal" overhead involved in streaming large payloads in the 200ms latency case, that still leaves 1-2 seconds of overhead that we can only attribute to the bug. It would appear that closing and re-opening the TCP session carries some additional penalty beyond the round-trip TCP session establishment time. Something called "slow start" TCP session tuning may be the culprit (sure sounds like it, don't ya think?), but to date I haven't proven it, mainly due to laziness, since I have the packet traces in which must lie the smoking gun.


Performance of Other Bindings with Large Payloads

Though it has nothing to do with the stated purpose of this post, it's fun to add back the other binding types to the chart above:


Seconds per call of various bindings on Vista, with various network latency figures, using a large web service method payload

While WS2007 performs about as well as the best of the basicHttp methods, buffered NetTcp is again the speed king, coming in at a good 1/3 faster than the best of the rest. 

But what can we say about streamed NetTcp? It's taking 15 seconds per call in the 200 ms latency case, a good 4 times slower than all the rest! Only Dr. Seuss seems adequate: 
This mess is so big,
and so deep,
and so tall,
we cannot clean it up.
There is no way at all
Granted, I'm not making any attempt to tune these bindings; other than max message size settings I'm using defaults. But note that all these bindings are blindingly fast in low latency environments. If nothing else this shows the danger of failing to take latency into account when testing your WCF services.

And that would seem like a good stopping point for this particular post. This exercise produced many more fascinating results which the margin of this blog is too small to contain, but that will have to wait for another day. Unless I change my mind, or forget.


Appendix - the Data

The industrious among you can reproduce the above graphs and many others using the following data.

Vista Data - Small Payload










Call/sec1ms50ms100ms200ms
NetTcp Buffered25818.049.624.8
BasicHttp Buffered1259.264.742.48
WS20071179.524.722.48
BasicHttps Buffered1119.344.672.48
NetTcp Streamed1118.584.442.46
BasicHttp Streamed1138.984.622.48
BasicHttps Streamed1038.984.722.36


Vista Data - Large Payload











Calls/sec1ms50ms100ms200ms
NetTcp Buffered81.8444440.91666670.5083333
WS20073.71.1472220.59722220.3111111
BasicHttp Buffered3.741.1444440.6055550.3194444
BasicHttps Buffered3.781.0888890.55555560.3083333
BasicHttp Streamed3.50.7750.42222220.2444444
BasicHttps Streamed3.40.82777780.447222220.24166667
NetTcp Streamed1.880.23611110.12222220.06944445


XP - Small payload









Calls/sec1ms50ms100ms200ms
NetTcp Buffered25018.133339.484.86
BasicHttp Buffered1319.7666664.82.5
WS20071209.54.882.46
BasicHttps Buffered1139.2333344.862.46
NetTcp Streamed1108.9333334.662.48
BasicHttp Streamed835.63.181.68
BasicHttps Streamed514.5333332.381.28


XP - Large Payload










Calls / sec1ms50ms100ms200ms
NetTcp Buffered7.162.0722221.1250.6222222
WS20073.381.3527780.75555560.3833333
BasicHttp Buffered3.91.3611110.74166670.4027778
BasicHttps Buffered3.421.0055560.53888890.2916667
BasicHttp Streamed3.180.64444450.32777780.1861111
BasicHttps Streamed3.420.59722220.31666670.1666667
NetTcp Streamed1.740.2333333330.127777780.06666667

Followers

The Spot