Beware of this trap when comparing strings in T-SQL with trailing spaces


Ran across one of those odd-syntax behavior anomalies that tend to send programmers on a one way trip down Wild Goose Chase Lane.

The Trap

All three of the following commands will give the counter-intuitive result of  ‘equal’

if '' = '        '    print 'equal' else  print 'not equal'
if '   ' = '     '    print 'equal' else print 'not equal'
if 'abc' = 'abc  '    print 'equal' else print 'not equal'

I have confirmed through experimentation that this odd behavior only applies to trailing spaces. The following example gives the expected result of ‘not equal’

if 'abc' = '   abc'    print 'equal' else print 'not equal'

Yikes! Scary isn’t it? Makes you wonder if you have code floating out there that depends on the premise that  strings with different lengths are never equal.

What is happening here?

According to Microsoft KB Article 316626, this quirky behavior is prescribed by the ANSI-SQL 92 specification.

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, <Comparison Predicate>, General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings ‘abc‘ and ‘abc ‘ to be equivalent for most comparison operations.

What to do about it

Although it is a mess when it comes to writing readable code, appending a character on both sides of the comparison seems to be the approach with the least potential for nasty side effects, especially when used with Unicode data.

if @stringA + 'x' = @stringB + 'x' ...

If you are going to do this, for God’s sake please comment the reason why, or better yet wrap it in a string comparison user defined function.

Some other ideas are presented in this blog article by Anthony Bloesch, including:

  1. Convert the strings to varbinary values and then test for equality.  (Has issues with Unicode combining marks)
  2. Replace all the space characters with an unlikely character.  (Difficult to pick character that won’t occur naturally)
  3. Use the like operator to do the equality check. (May cause strings to test equal if they match because of wildcards in the second string)
  4. Use XQuery’s string comparison. (Has issues with Unicode combining marks)

Doing my part

I was shocked to find nary a mention of this quirky, albeit standards-compliant, behavior in the SQL Server BOL article on the = operator despite the fact that every developer I demonstrated this behavior to received it with the same befuddled surprise.

I’ve added a community entry to the SQL BOL article on this operator in the hopes of saving someone a lot of headache tracking down an obscure logic bug someday.

More Information

Stack Overflow: SQL Server Empty String vs Space

Microsoft KB: How SQL Server Compares Strings with Trailing Spaces (316626)

SQL Server Books Online: = (Equals) (Transact-SQL)

Advertisements

8 Responses

  1. I’m intimately familiar with this scenario. I’ve been building a .NET front end for older FoxPro tables, and FoxPro always pads any varchar with spaces until it fills all available space of the field. I prefer to pad each value to the max length when doing comparison, as stated in the KB article. I think it’s more readable and intuitive.

  2. I’ve run into this before, but have forgotten about it. Thanks for the reminder.

    In situations that might present padded spaces, I tend to use RTRIM(), as they are typically of no use.

  3. […] A fellow blogger provides more information about this. […]

  4. The reason no one mentions it is because it is the way the human mind works. Strings of different lengths are always unequal? How is that useful? This rule is a victory of common sense and usability over geeky, computer-science-academic nonsense.

    • Maybe common sense needs an upgrade, because I don’t think it is geeky nonsense that x + y > x when y>0.

  5. OMG!
    Why not just try:

    if LTRIM(RTRIM(‘abc’)) = LTRIM(RTRIM(‘abc ‘)) print ‘equal’ else print ‘not equal’

    • The point is not to ignore trailing spaces, that’s already happening by default. The point is to treat differently padded strings as being different. There are some use cases where you want to know if two strings are literally different, spaces included.

  6. Use DataLength
    Please see: http://stackoverflow.com/questions/1399844/sql-server-2008-empty-string-vs-space

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: