Friday, February 6, 2009

Fast enumeration to string conversion in C#

Code

.NET's ToString() behavior on enumeration values is nice, translating the value to the string representation of it that we see in code, as this simple console application shows:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

namespace FastEnumToString

{

    class Program

    {

        public enum Color

        {

            Red,

            White,

            Blue,

            Green

        }

        static void Main(string[] args)

        {

            Console.WriteLine(Color.Red);

            Console.WriteLine(Color.White);

            Console.WriteLine(Color.Blue);

            Console.WriteLine(Color.Green);

            Console.ReadLine();

        }

    }

}

which yields this output:

Red

White

Blue

Green

But Microsoft has this warning about Enum.ToString:

Because this method searches metadata tables, it heavily uses system resources and can impede performance.

.NET uses reflection to look up the string representation of the enumeration when .ToString() is called. This simple test application shows the performance of the call on my laptop:

using System;

namespace FastEnumToString

{

    class Program

    {

        public enum Color

        {

            Red,

            White,

            Blue,

            Green

        }

        static void Main(string[] args)

        {

            var start = DateTime.Now;

            var rand = new Random();

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.ToString();

            }

            TimeSpan span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("ToString() took " + span.ToString());

            Console.ReadLine();

        }

    }

}

Which yields:

ToString() took 00:00:02.7460000

So it took 2.746 seconds to convert an enum to a string a million times. This may not seem too bad, but in a recent profiling of the application I work on we found that calling Enum.ToString() was one of the most significant performance problems in the system. We found that we were calling Enum.ToString() 50,000 times just to bring up our login screen! And that was just the tip of the iceburg - Enum.ToString() is being called literally millions of times in any given user session, adding noticable lag times to many operations in the system.


So what to do about it? The first thing that comes to mind, from traditional object oriented programming, is to override the ToString() method of the System.Enum class (the C# enum keyword being mere syntax sure on top of System.Enum), but attempts to do so are greeted with the compilation error, "Cannot derive from special class System.Enum".

Another simple solution would be to use .NET 3.5 extension methods:

using System;

namespace FastEnumToString

{

    public enum Color

    {

        Red,

        White,

        Blue,

        Green

    }

    

    static class Extensions

    {

        public static string FastToString(this Color color)

        {

            switch (color)

            {

                case Color.Red: return "Red"; break;

                case Color.Green: return "Green"; break;

                case Color.Blue: return "Blue"; break;

                case Color.White: return "White"; break;

                default: return "Undefined"; break;

            }

        }

    }

    class Program

    {

        static void Main(string[] args)

        {

            var start = DateTime.Now;

            var rand = new Random();

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.ToString();

            }

            TimeSpan span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("ToString() took " + span.ToString());

            start = DateTime.Now;

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.FastToString();

            }

            span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("FastToString() took " + span.ToString());

            Console.ReadLine();

        }

    }

}

The result is much faster:

ToString() took 00:00:02.5550000

FastToString() took 00:00:00.0680000

0.068 seconds for a million FastToString() calls, faster by a factor of 38. This is great, but the extension method code written above is tedious and error prone. When the next developer comes along and adds another color to the enumeration, he or she will have to remember to add it to the extension method. And any typo in the enumeration strings could produce difficult to find bugs down the road, especially if the code is using Enum.Parse to convert the string back into a Color.

This would be a good use of CodeDOM or some other code generation technique to automatically generate the extension methods, but here's another solution that can be coded directly:

using System;

using System.Globalization;

namespace FastEnumToString

{

    public enum Color

    {

        Red,

        White,

        Blue,

        Green

    }

    static class Extensions

    {

        private static class EnumStrings<T>

        {

            private static string[] _strings;

            static EnumStrings()

            {

                if (typeof(T).IsEnum)

                {

                    _strings = new string[Enum.GetValues(typeof(T)).Length];

                    foreach (System.Enum value in Enum.GetValues(typeof(T)))

                    {

                        _strings[((IConvertible)value).ToInt32(CultureInfo.InvariantCulture)] = value.ToString();

                    }

                }

                else

                {

                    throw new Exception("Generic type must be an enumeration");

                }

            }

            public static string GetEnumString(int enumValue)

            {

                return _strings[enumValue];

            }

        }

        public static string FastToString(this Color color)

        {

            return EnumStrings<Color>.GetEnumString((int)color);

        }

    }

    class Program

    {

        static void Main(string[] args)

        {

            var start = DateTime.Now;

            var rand = new Random();

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.ToString();

            }

            TimeSpan span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("ToString() took " + span.ToString());

            start = DateTime.Now;

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.FastToString();

            }

            span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("FastToString() took " + span.ToString());

            Console.ReadLine();

        }

    }

}

The results are quite good, roughly the same as the hardcoded solution:

ToString() took 00:00:02.6770000

FastToString() took 00:00:00.0580000

The idea is simple: we want to store the string descriptions of the enumerations in an array the first time they are accessed, so we have our FastToString extension method read them out of a static class that builds the array in its constructor. We use a generic static class because that's a convenient way to get the compiler to create a separate array for each type of enumeration.

Note the "if (typeof(T).IsEnum)" line in the constructor. We would dearly have liked to restrict the generic type with a where T : System.Enum constraint. That would have allowed us to write a single generic extension method for all enumerations:

        public static string FastToString<T>(this T value) where T : System.Enum

        {

            return EnumStrings<T>.GetEnumString((int)value);

        }

but when we try it we're greeted again with our friendly compiler error, "Constraint cannot be special class 'System.Enum'". Because of this I'm forced to enforce the generic type with a runtime exception (and it is a runtime exception - try replacing <Color> with <string> in the FastToString method and see what happens). This exception makes EnumStrings<T> a dangerous class to use, which is why I've made it a rare example of a private, nested class. It is essentially an un-type safe class, because the compiler is unable to enforce its type-safe use.

So each developer that creates an enumeration will need to add a corresponding FastToString overload for their enumeration, but at least they don't have to add or maintain the error-prone switch statement.

One more thing to mention before continuing: we can't remove the "where T : System.Enum" constraint in the FastToString overload above because there's no guarantee that the input parameter "value" can be cast to an int, but we can do any of these things:

        public static string FastToString<T>(this T value) where T : IConvertible

        {

            return EnumStrings<T>.GetEnumString(((IConvertible)value).ToInt32(CultureInfo.InvariantCulture));

        }

        public static string FastToString<T>(this System.Enum value)

        {

            return EnumStrings<T>.GetEnumString(((IConvertible)value).ToInt32(CultureInfo.InvariantCulture));

        }

        public static string FastToString<T>(this int value)

        {

            return EnumStrings<T>.GetEnumString(value);

        }

The client calls for these look like this:

string str = color.FastToString<Color>();

string str = ((Enum)color).FastToString<Color>();

string str = ((int)color).FastToString<Color>();

Which result in the following timings:

ToString() took 00:00:02.6130000

FastToString() took 00:00:00.0700000

FastToString<T>(this T value) took 00:00:00.3820000

FastToString<T>(this Enum value) took 00:00:00.3690000

FastToString<T>(this int value) took 00:00:00.0710000

The two overloads that involve use of IConvertible are about 5 times slower than the one that doesn't. This is most likely due to the boxing that takes places when casting the enum to IConvertible. It is, however, still 7 times faster than ToString().

But I wouldn't recommend using any of the overloads that include the generic parameter, because they expose EnumString<T>'s type-unsafety to the caller. Because of .NET's current limitation against using System.Enum as a base class constraint on T, there's no way for us to prevent the client from calling FastString<Color> on some other enumeration type, or from making a (5).FastString<Color> call, or even from calling FastString<string>.

The final thing to note is that the above code won't work on enumerations that don't use the default enumeration value - integer mapping, e.g.

    public enum Color

    {

        Red = 2,

        White = 4,

        Blue = 8,

        Green = 16

    }

even this is legal:

    public enum Color

    {

        Red = -1,

        White = -1,

        Blue = -1,

        Green = -1

    }

One way to handle these cases is to use a Dictionary to store the enumeration-string mapping rather than an array. We would expect dictionary lookups to be slower than array lookups, so we'll continue to use arrays for the "standard" enumerations.

using System;

using System.Globalization;

using System.Collections.Generic;

namespace FastEnumToString

{

    public enum Color

    {

        Red = -1,

        White = -1,

        Blue = -1,

        Green = -1

    }

    public enum State

    {

        Oregon,

        Washington,

        California,

        Idaho

    }

    static class Extensions

    {

        private static class EnumStrings<T>

        {

            private static string[] _strings = null;

            private static Dictionary<int, string> _stringDictionary = null;

            private static bool IsStandardSequence(Array values)

            {

                List<Enum> valuesList = new List<Enum>();

                foreach (Enum value in values)

                {

                    valuesList.Add(value);

                }

                valuesList.Sort();

                for (int i = 0; i < 1000000; i++)

                {

                    if (((IConvertible)valuesList[i]).ToInt32(CultureInfo.InvariantCulture) != i)

                    {

                        return false;

                    }

                }

                return true;

            }

            static EnumStrings()

            {

                if (typeof(T).IsEnum)

                {

                    if (IsStandardSequence(Enum.GetValues(typeof(T))))

                    {

                        _strings = new string[Enum.GetValues(typeof(T)).Length];

                         foreach (System.Enum value in Enum.GetValues(typeof(T)))

                        {

                            _strings[((IConvertible)value).ToInt32(CultureInfo.InvariantCulture)] = value.ToString();

                        }

                    }

                    else

                    {

                        _stringDictionary = new Dictionary<int, string>();

                        foreach (System.Enum value in Enum.GetValues(typeof(T)))

                        {

                            int valueAsInt = ((IConvertible)value).ToInt32(CultureInfo.InvariantCulture);

                            if (!_stringDictionary.ContainsKey(valueAsInt))

                                _stringDictionary.Add(valueAsInt, value.ToString());

                        }

                    }

                }

                else

                {

                    throw new Exception("Generic type must be an enumeration");

                }

            }

            public static string GetEnumString(int value)

            {

                string description;

                if (_strings != null)

                {

                    description = _strings[(int)value];

                }

                else

                {

                    _stringDictionary.TryGetValue(value, out description);

                }

                return description;

            }

        }

        public static string FastToString(this Color color)

        {

            return EnumStrings<Color>.GetEnumString((int)color);

        }

        public static string FastToString(this State state)

        {

            return EnumStrings<State>.GetEnumString((int)state);

        }

    }

    class Program

    {

        static void Main(string[] args)

        {

            var start = DateTime.Now;

            var rand = new Random();

            for (int i = 1; i < 1000000; i++)

            {

                var color = (Color)rand.Next(4);

                string str = color.ToString();

            }

            TimeSpan span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("ToString() took " + span.ToString());

            start = DateTime.Now;

            for (int i = 1; i < 1000000; i++)

            {

             var color = (Color)rand.Next(4);

                string str = color.FastToString();

            }

            span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("FastToString() using dictionary took " + span.ToString());

            start = DateTime.Now;

            for (int i = 1; i < 1000000; i++)

            {

                var state = (State)rand.Next(4);

                string str = state.FastToString();

            }

            span = new TimeSpan(DateTime.Now.Ticks - start.Ticks);

            Console.WriteLine("FastToString() using array took " + span.ToString());

            Console.ReadLine();

        }

    }

}

Here are the results:

ToString() took 00:00:02.986000

FastToString() using dictionary took 00:00:00.0900000

FastToString() using array took 00:00:00.0640000

So the dictionary lookup doesn't appear to be significantly slower than the array lookup, but that may be skewed by the fact that this dictionary contains only one element.

Speaking of which, if you're looking at this line

                            if (!_stringDictionary.ContainsKey(valueAsInt))

                                _stringDictionary.Add(valueAsInt, value.ToString());

and thinking, "Aha! A bug: in the example above only one color ever gets added to the dictionary, so FastGetString() called on any color returns the same value!" you are correct, but this is simply mirroring the behavior of .NET enums. When enumeration values share the same underlying integral value, ToString() on either produces the same result:

            Console.WriteLine("Color.Red.ToString() = " + Color.Red.ToString());

            Console.WriteLine("Color.White.ToString() = " + Color.White.ToString());

            Console.WriteLine("Color.Blue.ToString() = " + Color.Blue.ToString());

            Console.WriteLine("Color.Green.ToString() = " + Color.Green.ToString());

            Console.WriteLine("Color.Red.FastToString() = " + Color.Red.FastToString());

            Console.WriteLine("Color.White.FastToString() = " + Color.White.FastToString());

            Console.WriteLine("Color.Blue.FastToString() = " + Color.Blue.FastToString());

            Console.WriteLine("Color.Green.FastToString() = " + Color.Green.FastToString());

            Console.ReadLine();

results in this, which is surprising.

Color.Red.ToString() = White

Color.White.ToString() = White

Color.Blue.ToString() = White

Color.Green.ToString() = White

Color.Red.FastToString() = White

Color.White.FastToString() = White

Color.Blue.FastToString() = White

Color.Green.FastToString() = White

So it is possible to come up with a viable workaround for the incredible slowness of Enum.ToString() in .NET, and for all of the other obstacles to extending System.Enum that .NET throws in its way, though it would have been much easier if .NET had simply made Enum.ToString() fast in the first place.


1 comment:

  1. Great post, one thing I changed in the first simple switch statement, is the "default", in case extra values were added to the enum, I used the Enum.ToString(). So it would handle any missing values, but at a slower cost.

    ReplyDelete

Followers