A common anti-pattern I see from novice programmers is the tendency to read a coding tip somewhere, assume it to be a universal truth, and immediately start applying it everywhere in their code without fully understanding it. Usually the coding tip relates to optimization and is interpreted by the coder as “X is faster than Y, so always do Y instead of X.” This fallacy is particularly rampant with respect to the different approaches for concatenating strings.
I’m not writing this article to chastise the fledgling programmer who has fallen into this trap, nor is this intended as a how-to article on optimizing your code. Heaven knows there the internet is lousy with articles about the most efficient way to cram strings together. I will address the problems associated with some of the mythology about string concatenation, but my primary goal will be to encourage critical thinking and healthy skepticism for silver bullet programming techniques.
Though I possess no paranormal super-powers, I do believe I can read the mind of another human when I see code that looks like this:
StringBuilder WelcomeMessage = new StringBuilder(); WelcomeMessage.Append("Hello "); WelcomeMessage.Append(firstName); WelcomeMessage.Append(" "); WelcomeMessage.Append(LastName); WelcomeMessage.Append("!\n")
My spirit guide informs me that that programmer responsible for this code remembers reading somewhere that <insert programming language> is really inefficient at concatenating strings, but you can overcome that limitation by using the StringBuilder class. Based on this information he/she replaces every string concatenation operator with this clever technique leaving a scent trail through the code that experienced programmers can smell from miles away.
Problem 1: Premature Optimization is Procrastination
Sure, you want ALL of your code to perform well, but experienced programmers understand that their time is valuable and best spent on activities that deliver actual business value. Premature optimization and its ugly cousin Micro Optimization are almost always a waste of time. I understand how tempting it is to justify in your head that you could squeeze a little bit more performance out of your app by re-factoring the whole thing you learned in the blog post you read today, especially since it can be like a mini-vacation for your mind from the really complex issues you really should be working on, but be strong and resist!
As a rule of thumb: If it isn’t worth creating a jig to profile the performance gains you expect to get from optimization re-factoring, then it isn’t worth the time to do the re-factoring in the first place, plus it is risky because you won’t notice that your supposed “optimization” actually hurt performance.
More on that later.
Problem 2: Cookie Cutter Optimizations Assume the Compiler is Stupid
If you could universally make string concatenation faster by applying a simple formula then the compiler would probably already be applying the transformation anyway. Granted, I think this point is lost on some novice programmers who only have experience in higher level languages.
For them, I’ll clarify with this point.
Your program is running the compiler’s interpretation of your code. Not your actual code.
With that in mind, to think that the StringBuilder approach always runs faster would require you to believe that the people who wrote the programming language were smart enough to make string concatenation fast when they created the StringBuilder class, but forgot how to do it when they built the concatenation operator.
Pop Quiz: Does this code give you any heartburn? Why?
string querySQL = "SELECT * " + "FROM myTable " + "WHERE (ID=5)";
If you said yes because it isn’t worth incurring the cost of concatenation for code readability then you aren’t giving the compiler enough credit.
Here is the MSIL output for the above statement:
ldstr "SELECT * FROM myTable WHERE (ID=5)"
Compilers are written to do the complex task of reading your code and interpreting what it means. Figuring out that a series of constant strings can be combined is child’s play.
Problem 3: If you don’t understand it, you’ll do it wrong
Cargo Cult programming is a derisive term for doing things in your program because you think you need to, but don’t understand (or have a vague notion of) the underlying reason. It is really bad practice to adopt a technique without asking enough “why” questions to grasp the reason using it is desirable.
As an example, let’s dissect the premise that string concatenation using operators is slow and should be replaced by StringBuilders.
Q: Why do some claim that string concatenation with operators is slow?
In many garbage collected languages (Java/.NET) string objects are immutable, meaning you can’t change them. So when you append more content into an existing string the program must internally create a new string and copy the old and new contents into it. The extra effort to create, destroy and garbage collect the extra string objects has the potential to create more work for your program and can degrade performance if done excessively.
Q: How does the StringBuilder help?
The StringBuilder class is implemented as a mutable memory buffer that typically has extra unused space allocated so that concatenations can be made in place without the need to create extra objects to juggle the data.
Q: How much extra space does it reserve? What happens if I append more content than will fit in the unused space?
By default (in .NET) 16 characters, unless you specify differently in the constructor. If you append more data than there is space, the StringBuilder will behave much like a String object creating a new StringBuilder object with double the existing capacity then copy over the data.
You: Wait, what?
You mean that you have been using StringBuilder with the default constructor and then appending more than 16 characters to it?
Yeah, well if you are lucky you’ll be no worse off than if you just used the “evil” string concatenation operators. However, due to that neat capacity doubling side-effect, your program might actually be locking up unnecessarily large chunks of memory on top of the additional work required to wrangle all the intermediary objects. Perhaps it is worth investigating and setting the initial capacity of that StringBuilder to avoid such nastiness.
Bonus: Now that you understand the potential performance benefit is based (at least partly) on mutability, you will see that other string optimization opportunities may exist whenever existing strings need to be modified, not just appended to.
Again, the point of all this isn’t about strings, or optimization, or any of that. It is about taking the time to understand what you are doing to avoid falling prey to the potentially harmful myths that are enthusiastically passed around by programmers (see also “The database sorts by the clustered index if you don’t specify an Ordering“).
In any event, I’m curious as to how many of my readers actually have at one time subscribed to the cargo cult programing meme of “StringBuilder is always better”. Please let me know in the comments.