Ran across one of those odd-syntax behavior anomalies that tend to send programmers on a one way trip down Wild Goose Chase Lane.
All three of the following commands will give the counter-intuitive result of ‘equal’
if '' = ' ' print 'equal' else print 'not equal' if ' ' = ' ' print 'equal' else print 'not equal' if 'abc' = 'abc ' print 'equal' else print 'not equal'
I have confirmed through experimentation that this odd behavior only applies to trailing spaces. The following example gives the expected result of ‘not equal’
if 'abc' = ' abc' print 'equal' else print 'not equal'
Yikes! Scary isn’t it? Makes you wonder if you have code floating out there that depends on the premise that strings with different lengths are never equal.
What is happening here?
According to Microsoft KB Article 316626, this quirky behavior is prescribed by the ANSI-SQL 92 specification.
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, <Comparison Predicate>, General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings ‘abc‘ and ‘abc ‘ to be equivalent for most comparison operations.
What to do about it
Although it is a mess when it comes to writing readable code, appending a character on both sides of the comparison seems to be the approach with the least potential for nasty side effects, especially when used with Unicode data.
if @stringA + 'x' = @stringB + 'x' ...
If you are going to do this, for God’s sake please comment the reason why, or better yet wrap it in a string comparison user defined function.
Some other ideas are presented in this blog article by Anthony Bloesch, including:
- Convert the strings to varbinary values and then test for equality. (Has issues with Unicode combining marks)
- Replace all the space characters with an unlikely character. (Difficult to pick character that won’t occur naturally)
- Use the like operator to do the equality check. (May cause strings to test equal if they match because of wildcards in the second string)
- Use XQuery’s string comparison. (Has issues with Unicode combining marks)
Doing my part
I was shocked to find nary a mention of this quirky, albeit standards-compliant, behavior in the SQL Server BOL article on the = operator despite the fact that every developer I demonstrated this behavior to received it with the same befuddled surprise.
I’ve added a community entry to the SQL BOL article on this operator in the hopes of saving someone a lot of headache tracking down an obscure logic bug someday.
Stack Overflow: SQL Server Empty String vs Space
SQL Server Books Online: = (Equals) (Transact-SQL)