Monday, December 31, 2007

Regular expressions - Tools to assist the developer at heart.

I've been using regular expressions for a number of years in the workplace but am always keeping my eyes open for helpful tools (you can never have too many!). I recently picked up a Mac as a second system at home and stumbled upon a cool little utility. It's called RegExhibit. Screenshots speak louder than words, so take a look at the following:



What I found really useful with this tool was the immediate feedback as you typed your regular expressions. Another huge seller for me was the help files for this software, as it had a decently detailed collection all in one place (saving you the need to hunt down regex basics on the net).




And of course, this software is completely free, so it's hard to complain :). For those of you Mac-less, another free web-based regular expression tool I typically turn to in the workplace is: http://www.regextester.com.

Saturday, December 8, 2007

C# Generics - Performance Gain? Not so fast!

UPDATE (1/16/08): A reader recently pointed out that I should not have used a data type of string for my tests due to it being a reference type, but rather used a value type such as int. I would recommend referring to my new post located at http://strainthebrain.blogspot.com/2008/01/c-generics-performance-gain-benchmarks.html for a more accurate benchmarking of generics versus non-generics :).

A friend of mine at work has been referencing some of the more spiffy features of the .NET 2.0 framework, one of which being generics. I thought they sounded interesting so I spent some time playing around with them and reading up on all of the fine-grain details. There was one particular paragraph statement from Microsoft that I stumbled across on the MSDN documentation site (http://msdn2.microsoft.com/en-us/library/ms379564(VS.80).aspx) that stood out to me as they were comparing the non-generics versus generics approach:

When using value types, you have to box them in order to push and store them, and unbox the value types when popping them off the stack. Boxing and unboxing incurs a significant performance penalty in their own right, but it also increases the pressure on the managed heap, resulting in more garbage collections, which is not great for performance either. Even when using reference types instead of value types, there is still a performance penalty because you have to cast from an Object to the actual type you interact with and incur the casting cost.

This statement sounded logical at the time, but I, being the geek that I am, wanted to put this statement to the test to see if it held water. Unfortunately it not only didn't hold water, but it dried up like the Sahara Desert. Take the following sample code pieces I put together:

namespace GenericsPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            string stringVal = "";
            DateTime start, finish;
            TimeSpan mySpan;

           //Generics Test-------------------------------------------------------
           start = DateTime.Now;

           Stack<string> genStack = new Stack<string>();
           for (int i = 0; i < 1000000; i++)
               genStack.Push("test" + i);
           for (int i = 0; i < 1000000; i++)
               stringVal = genStack.Pop();

           finish = DateTime.Now;

           mySpan = finish.Subtract(start);
           Console.WriteLine("Gen Start: {0}\t Finish: {1}\t Timespan: {2}", start.ToLongTimeString(), finish.ToLongTimeString(), mySpan.TotalMilliseconds.ToString ());
           //--------------------------------------------------------------------

           Console.ReadLine();
       }
   }
}

Now compare the above code with the below code:

namespace NonGenericsPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            string stringVal = "";
            DateTime start, finish;
            TimeSpan mySpan;

           //Non-generics Test---------------------------------------------------
           start = DateTime.Now;

           Stack nonGenStack = new Stack();
           for (int i = 0; i < 1000000; i++)
               nonGenStack.Push("test" + i);
           for (int i = 0; i < 1000000; i++)
               stringVal = (string)nonGenStack.Pop();

           finish = DateTime.Now;

           mySpan = finish.Subtract(start);
           Console.WriteLine("Non Start: {0}\t Finish: {1}\t Timespan: {2}", start.ToLongTimeString(), finish.ToLongTimeString(), mySpan.TotalMilliseconds.ToString ());
           //--------------------------------------------------------------------

           Console.ReadLine();
       }
   }
}


I ran a series of output tests, of which I'll post the results:

Gen Start: 3:45:01 PM Finish: 3:45:03 PM Timespan: 1659.3544
Gen Start: 3:45:20 PM Finish: 3:45:22 PM Timespan: 1639.3874
Gen Start: 3:46:02 PM Finish: 3:46:03 PM Timespan: 1626.9005

Non Start: 3:43:25 PM Finish: 3:43:27 PM Timespan: 1601.653
Non Start: 3:43:51 PM Finish: 3:43:53 PM Timespan: 1620.604
Non Start: 3:44:21 PM Finish: 3:44:22 PM Timespan: 1604.9792

What's interesting to see here is the fact that the generics approach consistently under performs the non-generics approach for performing the exact same actions. While this may vary from system to system, I found it quite surprising that it directly contradicted two of the main arguments in favor of using generics.

Bearing all of this in mind, assuming my ultimate need is not utmost performance, I prefer making use of generics simply because it helps in clarifying content being held within data structures and also simplifies the data retrieval process (no need for type-casting).

Ultimately, the point to keep in mind here is that generics may or may not gain you much in the way of performance, by eliminating the need for boxing/unboxing and typecasting. Your mileage may vary.