Concatenating Strings – You’re still doing it wrong.

A common anti-pattern I see from novice programmers is the tendency to read a coding tip somewhere, assume it to be a universal truth, and immediately start applying it everywhere in their code without fully understanding it. Usually the coding tip relates to optimization and is interpreted by the coder as “X is faster than Y, so always do Y instead of X.” This fallacy is particularly rampant with respect to the different approaches for concatenating strings.

Car with Wing

But it works on race cars....

I’m not writing this article to chastise the fledgling programmer who has fallen into this trap, nor is this intended as a how-to article on optimizing your code. Heaven knows there the internet is lousy with articles about the most efficient way to cram strings together. I will address the problems associated with some of the mythology about string concatenation, but my primary goal will be to encourage critical thinking and healthy skepticism for silver bullet programming techniques.

The Symptom

Though I possess no paranormal super-powers, I do believe I can read the mind of another human when I see code that looks like this:

    StringBuilder WelcomeMessage = new StringBuilder();
    WelcomeMessage.Append("Hello ");
    WelcomeMessage.Append(firstName);
    WelcomeMessage.Append(" ");
    WelcomeMessage.Append(LastName);
    WelcomeMessage.Append("!\n")

My spirit guide informs me that that programmer responsible for this code remembers reading somewhere that <insert programming language> is really inefficient at concatenating strings, but you can overcome that limitation by using the StringBuilder class. Based on this information he/she replaces every string concatenation operator with this clever technique leaving a scent trail through the code that experienced programmers can smell from miles away.

Problem 1: Premature Optimization is Procrastination

Sure, you want ALL of your code to perform well, but experienced programmers understand that their time is valuable and best spent on activities that deliver actual business value. Premature optimization and its ugly cousin Micro Optimization are almost always a waste of time. I understand how tempting it is to justify in your head that you could squeeze a little bit more performance out of your app by re-factoring the whole thing you learned in the blog post you read today, especially since it can be like a mini-vacation for your mind from the really complex issues you really should be working on, but be strong and resist!

As a rule of thumb: If it isn’t worth creating a jig to profile the performance gains you expect to get from optimization re-factoring, then it isn’t worth the time to do the re-factoring in the first place, plus it is risky because you won’t notice that your supposed “optimization” actually hurt performance.

More on that later.

Problem 2: Cookie Cutter Optimizations Assume the Compiler is Stupid

If you could universally make string concatenation faster by applying a simple formula then the compiler would probably already be applying the transformation anyway. Granted, I think this point is lost on some novice programmers who only have experience in higher level languages.

For them, I’ll clarify with this point.

Your program is running the compiler’s interpretation of your code. Not your actual code.

With that in mind, to think that the StringBuilder approach always runs faster would require you to believe that the people who wrote the programming language were smart enough to make string concatenation fast when they created the StringBuilder class, but forgot how to do it when they built the concatenation operator.

Pop Quiz: Does this code give you any heartburn? Why?

    string querySQL = "SELECT * " +
                      "FROM myTable " +
                      "WHERE (ID=5)";

If you said yes because it isn’t worth incurring the cost of concatenation for code readability then you aren’t giving the compiler enough credit.

Here is the MSIL output for the above statement:

ldstr      "SELECT * FROM myTable WHERE (ID=5)"

Amazing, huh?

Compilers are written to do the complex task of reading your code and interpreting what it means. Figuring out that a series of constant strings can be combined is child’s play.

Problem 3: If you don’t understand it, you’ll do it wrong

Cargo Cult programming is a derisive term for doing things in your program because you think you need to, but don’t understand (or have a vague notion of)  the underlying reason. It is really bad practice to adopt a technique without asking enough “why” questions to grasp the reason using it is desirable.

As an example, let’s dissect the premise that string concatenation using operators is slow and should be replaced by StringBuilders.

Q: Why do some claim that string concatenation with operators is slow?
In many garbage collected languages (Java/.NET) string objects are  immutable, meaning you can’t change them. So when you append more content into an existing string the program must internally create a new string and copy the old and new contents into it. The extra effort to create, destroy and garbage collect the extra string objects has the potential to create more work for your program and can degrade performance if done excessively.

Q: How does the StringBuilder help?
The StringBuilder class is implemented as a mutable memory buffer that typically has extra unused space allocated so that concatenations can be made in place without the need to create extra objects to juggle the data.

Q: How much extra space does it reserve? What happens if I append more content than will fit in the unused space?
By default (in .NET) 16 characters, unless you specify differently in the constructor. If you append more data than there is space, the StringBuilder will behave much like a String object creating a new StringBuilder object with double the existing capacity then copy over the data.

You: Wait, what?

You mean that you have been using StringBuilder with the default constructor and then appending more than 16 characters to it?

Yeah, well if you are lucky you’ll be no worse off than if you just used the “evil” string concatenation operators. However, due to that neat capacity doubling side-effect, your program might actually be locking up unnecessarily large chunks of memory on top of the additional work required to wrangle all the intermediary objects. Perhaps it is worth investigating and setting the initial capacity of that StringBuilder to avoid such nastiness.

Bonus: Now that you understand the potential performance benefit is based (at least partly) on mutability, you will see that other string optimization opportunities may exist whenever existing strings need to be modified, not just appended to.

Final Thoughts

Again, the point of all this isn’t about strings, or optimization, or any of that. It is about taking the time to understand what you are doing to avoid falling prey to the potentially harmful myths that are enthusiastically passed around by programmers (see also “The database sorts by the clustered index if you don’t specify an Ordering“).

In any event, I’m curious as to how many of my readers actually have at one time subscribed to the cargo cult programing meme of “StringBuilder is always better”. Please let me know in the comments.

Advertisements

A Manager’s Retrospective on the C# versus VB.NET decision

There is no shortage of discussion or flame wars weighing the relative merits of the various Microsoft .NET programming languages, nor is there an outcry for another opinion on the subject. This article will instead discuss the process of making a sound business decision on a controversial issue without inciting mutiny among the developers. More importantly, I’ll revisit some of the key considerations in that decision through the razor sharp focus of hindsight. It is my hope that my experiences from the trenches will provide guidance to software development managers who have not yet made the jump to .NET, are bootstrapping a new shop, or are considering a switch from another platform.

It’s been just over six years since I managed the transition of my development team to Microsoft’s .NET platform and made the call, for better or worse, to standardize on VB.NET. As a developer myself, I was keenly aware of the how the programming language of a software development shop defines its culture and similarly shapes the professional identity of many programmers. The polemic among developers created a real risk of breaking down the team dynamic through second-guessing and deflated morale regardless of my choice. As is the case with implementing any politically sensitive change, the prudent course of action was to aggressively seek buy-in by actively soliciting input of each team member, remaining visibly objective, and maximizing the transparency of the issues that factored into the final decision. Despite the lack of unanimity requiring over 30% of the team to acquiesce on this hot-button standardization issue, the team dynamic remained strong after my final decision.

The Players

Although the list of languages that are currently supported on the .NET framework is more diverse than most would probably imagine, the options were far more limited back in 2003.

C#.NET: A hybrid of C and Java that attempted to lure defectors from the Java camp to the .NET platform. It also provides an easy transition for C++ refugees who are sick of the annoyances of manual memory management and having 9 incompatible ways to implement a simple string.

J#.NET: Most of the quirks of Java and none of the cross-platform support and about as useful as non-alcoholic beer. Enables developers to pretend they still hate Microsoft, but participate in their market share. Also allows Microsoft to pretend that C# wasn’t a blatant attempt to crush the upstart Java language that threatened to weaken their dominance of the OS market. The technology pundits have moved on to other issues enough for Microsoft to silently drop support for J# from Visual Studio.

VB.NET: The next generation of the popular Visual Basic language, all grown up and fully OOP capable. It is unfortunate that Microsoft didn’t take this opportunity to rename the language to remove the stigma associated with programmers using this language that only resembles its forebears in its verbosity. I assume they kept the name to provide an unambiguous upgrade path for the unwashed masses of VB6 programmers many of which ironically and belligerently dug in their heels and refused to upgrade.

C++.NET: For a while I assumed that I must have been missing something with respect to C++.NET. The juxtaposition of the venerable I’ll-roll-my-own-thank-you-very-much C++ and the don’t-worry-your-pretty-little-head-about-cleaning-up-that-memory .NET framework felt even more awkward than “Java.NET.” I took a course to absolve me of my silly notions, but only managed to reinforce them. C++.NET is neither fish nor fowl. It is like a halfway house for recovering masochist programmers who really want to like automated garbage collection, but want to take it one day at a time. Just one more hit of STL, I promise I’ll quit tomorrow! In one of the rare parallels with the VB camp, many of the Microsoft Visual C++ developers found little motivation to abandon Visual Studio 6.

The Contenders

Let’s face it. Aside from C#, VB.NET, and their ugly cousin C++.NET, none of the remaining .NET languages were used for much more than  Microsoft demos to prove that they wanted to support open standards. This was even truer back in 2003 when I was weighing the options. The team divided itself naturally into a C# and VB.NET camp and no one even hinted at seriously considering a third option.

The Process

It is extremely important when soliciting input from employees to be very clear about how the final decision is going to be made, especially when it is not the purely democratic process that they might otherwise assume. Recognizing this, I was careful to communicate that while everyone’s input was valued and would be carefully considered, I’d be making the final decision based on both the advice of the team and the interests of the business. I promised complete transparency on my decision and a chance to appeal it on factual, but not subjective grounds. As it happened, no such appeals were proffered. The C# proponents were slightly disappointed, but were also thankfully supportive the decision to go with VB.NET.

My solicitation for input included the following:

  • Provide as many supporting arguments for your preference as apply to our environment.
  • Arguments based on business objectives (reduced cost, greater efficiency) will be weighed more heavily.
  • Performance comparisons between the languages had been researched and dismissed already, don’t bother including them.
  • It is perfectly valid to support your preference in terms of your personal goals (money, prestige, taste, etc.).
  • Regardless of whether you can support your preference, please indicate it in your response. Morale impact is a key consideration.

Facts, Factors and Opinion

After compiling my own research, business case analysis, and the collective wisdom and opinion of my team, the following key themes emerged.

  • Functionality (TIE): Functional differences between the capabilities of these languages were primarily cosmetic/syntactic/irrelevant(Appleman/Atwood), MSDN.  There were a few IDE features that one language or the other had the edge on, but MS was already promising IDE parity in future releases.
  • Learning Curve (VB.NET): Because our previous standard was VBScript for ASP pages, and VB6 for the middle tier object layers, VB.NET seemed like the quickest path to productivity. I was particularly concerned about the impact of forcing the developers to make the jump to a completely different language at the same time they would be also required to get up to speed on the .NET framework and ASP.NET.
  • Existing Code (VB.NET): We were maintaining a significant amount of existing VB6 and VBScript code that would need to be ported to .net. Visual Studio provided tools to automate most of the conversion from to VB.NET, but considerable extra time would have been required to go to C#.
  • Product Road maps for .NET languages (VB.NET): An MSDN magazine article comparing the road maps for the two languages indicated that VB.NET was going to continue to be optimized for rapid application development while C# would follow the tradition of C++ and trend towards power-user features. The direction of VB.NET was more relevant to our development efforts which were mostly on CRUD type applications.
  • Developer Preferences (VB.NET): Despite the tendency of most to gravitate towards the familiar in platform wars, C# had a good showing. The balance was tipped by a few developers who had already started down the VB.NET track of their MCSD certifications. I hate to reinforce the stereotype, but it was interesting to note that the more senior developers were mostly in the C# camp.
  • Developer “Street Creds” (C#): Almost all of the developers subscribed at least weakly to the popular notion that C# experience would command a higher premium than VB.NET by riding the coattails of C++ as a more respected language. Whether justified or not, it appears that this rumor has become fact.
  • Product “Street Creds” (C#): There was also some concern that the “taint of VB” would hurt the perceived professionalism of our software among our customers. I have not yet seen evidence of this, however, we don’t advertise the language we use, and our customers don’t generally ask or care.
  • Recruiting (TIE C#): There was a lot of discussion that MS was strongly favoring C# as the language of choice for .NET and that most developers were following suit. I was somewhat concerned about the possibility of VB.NET becoming marginalized and creating recruiting hurdles. That is, if VB.NET became uncool, it might be difficult to get people to respond to our job postings even if we didn’t discriminate against people with only experience in other .NET languages. On the other hand, I reasoned that we might be able to avoid paying a “C# premium” that didn’t guarantee any associated additional value add over a competent VB.NET programmer. I think that predictions of VB.NET’s demise were very premature as it continues to have a solid following. However, in retrospect, I think I underestimated the impact of the decision on recruiting. I suspect that some of our recruiting difficulty (even with very competitive salaries) can be attributed to the fact that VB.NET developers readily apply to C# jobs, but the reverse is not as common.
  • Language Obsolescence (Non-factor): A small contingent was even predicting that Microsoft planned to phase out VB.NET in favor of C#. I just didn’t buy into this notion given my experience with how slowly Microsoft has retired product lines in the past. It took them 27 years, for example, to retire FoxPro after acquiring it, and salvaging it for parts to include in their competing database package, MS Access. So far it appears my skepticism was warranted.

Recommendations

As I have already mentioned, our team went with VB.NET. I still think it was probably the right decision given the skills of my existing team and our considerable investment in legacy code in VB based languages, despite the minor difficulties it has created for recruiting. Absent either of these preconditions, however, I definitely would suggest a preference for C# for any manager going through this decision today. That said, this article is no substitute for your own research. As is the case with any heated topic, there is an abundance of misinformation, some of which may even be repeated by members of your own team. One resource that I have come to trust over my years in this industry is Dan Appleman, who has published a highly regarded e-book, Visual Basic .NET or C#, Which to Choose? (VS2005 edition).

The lesson that can be drawn from my experience, however, is not so much about a preference of one language over the other, but instead the importance of involving the whole team in the change management process. This decision has major ramifications on their professional careers that extend beyond the walls of your organization and has the potential to create significant anxiety among them. Manage that anxiety by including them in the process, making it clear that their individual interests are being carefully considered, and the making the decision objectively and transparently as possible.

Links to the Fray