How to diff two folders from a Windows command prompt

In most cases where I need to compare two folders recursively on a Windows system I use my go-to tool Beyond Compare. It is an excellent utility, and one that I think should be among the first utilities any developer should install on a new machine.

However, today I was doing a reconciliation as part of a very large file migration project that required comparing two folders that each contained hundreds of millions of files spread across thousands of sub-folders. BC was having a lot of trouble and choked on many of my comparisons. It just wasn’t the tool for today’s job. I needed another solution.

Necessity mothered some invention and I found an inventive way to use a combination of command switches on  RoboCopy to perform the comparison. If you are not familiar with RoboCopy, and you do a lot of mass copying of files, you need to stop what you are doing and learn about it pronto. It is a supercharged version of XCopy that has been included with Windows since Vista. It has a ton of great features such as multi-threaded file copying, selectively copying changed files, and resumable copies that make it a must especially for big file copy jobs over flaky network connections.

Diff Command Using RoboCopy

So here’s the command to perform a basic comparison of two folders and write a log file listing the differences.

ROBOCOPY “\\FileShare\SourceFolder” “\\FileShare\ComparisonFolder” /e /l /ns /njs /njh /ndl /fp /log:reconcile.txt

Explanation of the command switches used above:

/e  Recurse through sub-directories (including empty ones)
/l  Don’t modify or copy files, log differences only
/fp  Include the full path of files in log (only necessary if you omit /ndl)
/ns  Don’t include file sizes in log
/ndl  Don’t include folders in log
/njs   Don’t include Job Summary
/njh   Don’t include Job Header
/log:reconcile.txt   Write log to reconcile.txt (Recreate if exists)
/log+: reconcile.txt   (Optional variant) Write log to reconcile.txt (Append if exists)

Usage Notes and Warnings Regarding the /NDL Option

The /NDL option is a handy way to suppress the inclusion of every folder checked (regardless of whether it contains differences) in the log, but there because of the way it works it is not a good idea in all circumstances. Consider the following before you use /NDL.

  • Folders that exist only on source or destination are not logged unless at least one mismatched file is present or a source file is missing on destination.
  • Folders that exist only on the destination are not logged at all regardless of contents.

If you omit the /NDL option, it is necessary to include the /FP option if you want full paths listed for each file.

Example Output

(with /NDL option)

*EXTRA File         c:\dest\log.txt
New File              c:\source\newfolder\Blah.txt
Newer                 c:\source\Files\CONCORD.DAT
New File              c:\source\Files\COWCO.DAT

(without NDL Option)

 c:\work\test\source\    (extraneous folder listing)
*EXTRA Dir      c:\dest\newfolderdest\
*EXTRA Dir      c:\dest\newfolderrestempty\
*EXTRA File     c:\dest\log.txt
New Dir           c:\source\newfolder\
New File          c:\source\newfolder\Blah.txt
New Dir           c:\source\newfolderempty\
c:\source\Files\   (extraneous folder listing)
Newer             c:\source\Files\CONCORD.DAT
New File          c:\source\Files\COWCO.DAT
c:\test\source\FilesSame\   (Included despite no diffs)

5 Best Practices for Commenting Your Code

One of the first things you learn to do incorrectly as a programmer is commenting your code. My experience with student and recently graduated programmers tells me that college is a really good place to learn really bad code commenting techniques. This is just one of those areas where in-theory and in-practice don’t align well.

There are two factors working against you learning good commenting technique in college.

  1.  Unlike the real world, you do a lot of small one-off projects as a solo developer.  There’s no one out there fantasizing about dropping a boulder on you for making them decipher your coding atrocity.
  2.  That commenting style you are emulating from your textbook is only a good practice when the comments are intended for a student learning to program. It is downright annoying to professional programmers.

These tips are primarily intended for upstart programmers who are transitioning into the real world of programming, and hopefully will prevent a few from looking quite so n00bish during their first code review. Code Review? Oh yeah, that’s something else they didn’t teach you in school, but that’s a whole other article, I’ll defer to Jason Cohen on that one.

So let’s get started…

(1) Comments are not subtitles

It’s easy to project your own worldview that code is a foreign language understood only by computers, and that you are doing the reader a service by explaining what each line does in some form of human language. Or perhaps you are doing it for the benefit of that non-programmer manager who will certainly want to read your code (Spoiler: He won’t).

Look, in the not too distant future, you will be able to read code almost as easily as your native language, and everyone else who will even glance at it almost certainly already can. By then you will realize how silly it is to write comments like these:

// Loop through all bananas in the bunch
foreach(banana b in bunch) {
    monkey.eat(b);  //make the monkey eat one banana
}

You may have been taught to program by first writing  pseudo-code comments then writing the real code into that wire-frame. This is a perfectly reasonable approach for a novice programmer. Just be sure to replace the comments with the code, and don’t leave them in there.

Computer: Enemy is matching velocity.
Gwen DeMarco: The enemy is matching velocity!
Sir Alexander Dane: We heard it the first time!
Gwen DeMarco: Gosh, I’m doing it. I’m repeating the darn computer!

-Galaxy Quest

Exceptions:

  • Code examples used to teach a concept or new programming language.
  • Programming languages that aren’t remotely human readable (Assembly, Perl)

(2) Comments are not an art project

This is a bad habit propagated by code samples in programing books and open source copyright notices that are desperate to make you pay attention to them.

/*
   _     _      _     _      _     _      _     _      _     _      _     _
  (c).-.(c)    (c).-.(c)    (c).-.(c)    (c).-.(c)    (c).-.(c)    (c).-.(c)
   / ._. \      / ._. \      / ._. \      / ._. \      / ._. \      / ._. \
 __\( Y )/__  __\( Y )/__  __\( Y )/__  __\( Y )/__  __\( Y )/__  __\( Y )/__
(_.-/'-'\-._)(_.-/'-'\-._)(_.-/'-'\-._)(_.-/'-'\-._)(_.-/'-'\-._)(_.-/'-'\-._)
   || M ||      || O ||      || N ||      || K ||      || E ||      || Y ||
 _.' `-' '._  _.' `-' '._  _.' `-' '._  _.' `-' '._  _.' `-' '._  _.' `-' '._
(.-./`-'\.-.)(.-./`-'\.-.)(.-./`-'\.-.)(.-./`-'\.-.)(.-./`-'\.-.)(.-./`-'\.-.)
 `-'     `-'  `-'     `-'  `-'     `-'  `-'     `-'  `-'     `-'  `-'     `-'

                 -It's Monkey Business Time! (Version 1.5)
*/

Why, that’s silly. You’d never do something so silly in your comments.

ORLY? Does this look familiar?

   +------------------------------------------------------------+
   | Module Name: classMonkey                                   |
   | Module Purpose: emulate a monkey                           |
   | Inputs: Bananas                                              |
   | Outputs: Grunts                                            |
   | Throws: Poop                                               |
   +------------------------------------------------------------+

Programmers love to go “touch up” their code to make it look good when their brain hurts and they want something easy to do for a while. It may be a waste of time, but at least they are wasting it during periods where they wouldn’t be productive anyway.

The trouble is that it creates a time-wasting maintenance tax imposed on anyone working with the code in the future just to keep the pretty little box intact when the text ruins the symmetry of it. Even programmers who hate these header blocks tend to take the time to maintain them because they like consistency and every other method in the project has one.

How much is it bugging you that the right border on that block is misaligned? Yeah. That’s the point.

(3) Header Blocks: Nuisance or Menace?

This one is going to be controversial, but I’m holding my ground. I don’t like blocks of header comments at the top of every file, method or class.

Not in a boat, not with a goat.
Why? Well let me tell you, George McFly…

They are enablers for badly named objects/methods – Of course, header blocks aren’t the cause for badly named identifiers, but they are an easy excuse to not  put in the work to come up with meaningful names, an often deceptively difficult task. It provides too much slack to just assume the consumer can just read the “inline documentation” to solve the mystery of what the DoTheMonkeyThing method is all about.

JohnFx’s Commandment:
The consumer of thy code should never have to see its source code to use it, not even the comments.

They never get updated: We all know that methods are supposed to remain short and sweet, but real life gets in the way and before you know it you have a 4K line class and the header block is scrolled off of the screen in the IDE 83% of the time. Out of sight, out of mind, never updated.

The bad news is that they are usually out of date. The good news is that people rarely read them so the opportunity for confusion is mitigated somewhat. Either way, why waste your time on something that is more likely to hurt than help?

JohnFx’s Maxim of Plagiarized Ideas :
Bad Documentation is worse than no documentation.

Exception: Some languages (Java/C#) have tools that can digest specially formatted header block comments into documentation or Intellisense/Autocomplete hints. Still, remember rule (2) and stick to the minimum required by the tool and draw the line at creating any form of ASCII art.

(4) Comments are not source control

This issue is so common that I have to assume that programmers (a) don’t know how to use source control; or  (b) don’t trust it.

Archetype 1: “The Historian”

     // method name: pityTheFoo (included for the sake of redundancy)
     // created: Feb 18, 2009 11:33PM
     // Author: Bob
     // Revisions: Sue (2/19/2009) - Lengthened monkey's arms
     //            Bob (2/20/2009) - Solved drooling issue

     void pityTheFoo() {
          ...
     }

The programmers involved in the evolution of this method probably checked this code into a source control system designed to track the change history of every file, but decided to clutter up the code anyway. These same programmers more than likely always leave the Check-In Comments box empty on their commits.

I think I hate this type of comment worst of all, because it imposes a duty on other programmers to keep up the tradition of duplicating effort and wasting time maintaining this chaff. I almost always delete this mess from any code I touch without an ounce of guilt. I encourage you to do the same.

Archetype 2: “The Code Hoarder”


     void monkeyShines() {
          if (monkeysOnBed(Jumping).count > max) {
             monkeysOnBed.breakHead();

             // code removed, smoothie shop closed.
             // leaving it in case a new one opens.
             // monkeysOnBed.Drink(BananaSmoothie);
          }
     }

Another feature of any tool that has any right to call itself a SCM is the ability to recover old versions of code, including the parts you removed. If you want to be triple super extra sure, create a branch to help you with your trust issues.

(5) Comments are a code smell

Comments are little signposts in your code explaining it to future archaeologists that desperately need to understand how 21st century man sorted lists of purchase orders.

Unfortunately, as Donald Norman explained so brilliantly in The Design of Everyday Things, things generally need signs because their affordances have failed. In plain English, when you add a comment you are admitting that you have written code that doesn’t communicate its purpose well.

Sign:”This is a mop sink.” Why would that be necces… oh.

Despite what your prof told you in college, a high comment to code ratio is not a good thing.  I’m not saying to avoid them completely, but if you have a 1-1 or even a 5-1 ratio of LOC to comments, you are probably overdoing it. The need for excessive comments is a good indicator that your code needs refactoring.

Whenever you think, “This code needs a comment” follow that thought with, “How could I modify the code so its purpose is obvious?”
Talk with your code, not your comments.

Technique 1: Use meaningful identifiers and constants (even if they are single use)

     // Before
     // Calculate monkey's arm length
     // using its height and the magic monkey arm ratio
     double length = h * 1.845; //magic numbers are EVIL!

    // After - No comment required
    double armLength = height * MONKEY_ARM_HEIGHT_RATIO;

Technique 2: Use strongly typed input and output parameters

      // Before
      // input parameter monkeysToFeed:
      // DataSet with one table containing two columns
      //     MonkeyID (int) the monkey to feed
      //     MonkeyDiet (string) that monkey's diet
      void feedMonkeys(DataSet monkeysToFeed) {
      }

     //  After: No comment required
     void feedMonkeys(Monkeys monkeysToFeed) {
     }

Technique 3: Extract commented blocks into another method

      // Before

      // Begin: flip the "hairy" bit on monkeys
      foreach(monkey m in theseMonkeys) {
          // 5-6 steps to flip bit.
      }
      // End: flip the "hairy" bit on monkeys

     // After No comment required
     flipHairyBit(theseMonkeys);

As an added bonus, technique 3 will tend to reduce the size of your methods and minimizing the nesting depth (see also “Flattening Arrow Code”) all of which contribute to eliminating the need for commenting the closing tags of blocks like this:

            } // ... if see evil
         } // ... while monkey do.
      } // ... if monkey see.
    } // ... class monkey
  } // ... namespace primate

Acknowledgments

Several of the ideas presented here, and a good deal of the fundamental things I know about programming as part of a team, I learned from the book Code Complete by Steve McConnell. If you are a working programmer and have not read this book yet, stop what you are doing and read it before you write another line of code.

Quick Tip: Comparing a .NET string to multiple values

The Problem

Do you ever wish you could use the SQL IN operator in your C# code to make your conditional blocks more concise and your code easier to read?

Perhaps it’s just my persnickety nature, but I believe that line-wrapped conditional expressions like this are a code smell.

if (animal.Equals("Cow") ||
   animal.Equals("Horse") ||
   animal.Equals("Hen"))
{
   Console.WriteLine("We must be on the farm.");
}

This would be so much cleaner…

if (animal.CompareMultiple("Cow","Horse","Hen")
{
   Console.WriteLine("We must be on the farm.");
}

The Code

With a simple extension class you can upgrade your string classes to do this very thing.

Step 1: Create an extension class as demonstrated here.

C#

namespace extenders.strings
{
  public static class StringExtender {

    public static bool CompareMultiple(this string data, StringComparison compareType, params string[] compareValues) {
      foreach (string s in compareValues) {
        if (data.Equals(s, compareType)) {
          return true;
        }
      }
      return false;
   }
  }
}

VB.NET

Imports System.Runtime.CompilerServices

Namespace Extenders.Strings

  Public Module StringExtender

    <Extension()> _
    Public Function CompareMultiple(ByVal this As String, compareType As StringComparison, ParamArray compareValues As String()) As Boolean

       Dim s As String

       For Each s In compareValues
         If (this.Equals(s, compareType)) Then
            Return True
         End If
       Next s

       Return False
    End Function

  End Module

End Namespace

Step 2: Add a reference to the extension namespace and use it.

C#

using extenders.strings;

namespace MyProgram
{
  static class program {
    static void Main() {

      string foodItem = "Bacon";

      if (foodItem.CompareMultiple(StringComparison.CurrentCultureIgnoreCase, "bacon", "eggs", "biscuit")) {
         System.Console.WriteLine("Breakfast!");
      }
      else {
         System.Console.Write("Dinner");
      }

    }
  }
}

VB.NET

Imports StringExtenderExampleVB.Extenders.Strings

Module Program

  Sub Main()
    Dim foodItem As String = "Bacon"

    If (foodItem.CompareMultiple(System.StringComparison.CurrentCultureIgnoreCase, "bacon", "eggs", "biscuit")) Then
      System.Console.WriteLine("Breakfast!")
    Else
      System.Console.Write("Dinner")
    End If

    System.Console.ReadLine()
  End Sub

End Module

How to join on memo fields in Microsoft Access

Rambling Intro, Nostalgia, and Crankiness

This week I got a request troubleshoot a legacy Microsoft Access application that has been floating around our company for ages, but still gets used daily because dang it, it does the job and always has. Seems like most companies that are standardized on MS Office have a few of these lurking out on the network.

Earlier in my career I did a ton of work in MS Access and have garnered a reputation within my company for being an expert in this oft maligned platform so I got the call to look into the problem. It had been quite a while since I’d done any real work on MS Access and I’d forgotten about how quirky it could be. Also, I am more than a little disappointed at how Microsoft has mangled the UI of my old friend Access in the 2007 version. It is almost painful to work with it as a power-user in the current incarnation.

The Problem

So anyway, the issue turned out to be that someone increased the length of a field in the underlying SQL Server table linked into the Access application. They increased it past the magical border (255 characters) between what Access considers a text and a memo field, which imposed new limits on how it could be used. In particular, Access doesn’t allow either end of a join in a query to be a memo field.

Can't join on memo fields

This won't fly, McFly

The Solution

The solution is painfully simple. So much so that I have to wonder why Access doesn’t just do it behind the scenes. Perhaps it is just trying to discourage you from building databases that link on big text fields for your own good (see “The Caveat” below).

The trick is to move the join into the WHERE clause of the query  like so:

SELECT Table1.*, Table2.* FROM Table1, Table2 WHERE (Table1.MemoField=table2.MemoField);

Here is the same query in the query builder for those of you who prefer it to the SQL view:

Graphical display of query

Remove the join between the tables and add a criterion

Access will raise nary a complaint if you run this query which is logically equivalent to the one it abhorred. That’s all there is to it.

The Caveat

A final note. It is a definite database smell for an application to be joining tables on long text fields and will likely be the source of some performance issues in a database of non-trivial size. However, as was the case for the application I was tweaking, joining on long text fields is sometimes necessary in queries used for data clean-up, validation, or replication.  Still, use this type of join with caution avoiding it whenever possible.

Tip: Use delayed service starting to speed up the boot process of your development machine

Waiting on the computerOne of the minor aggravations in my life, right behind lyricists who want to “hold me tight” despite all the homeless adverbs in the world,  is time required to get my Windows development machine from a cold brick to a state where I can do productive work.

This is exacerbated by the fact that I, as a developer,  tend to re-start my machine more often than a typical computer user. Also, I’ve got A number of heavyweight services than run on my development machine including web and database servers so I can work on my projects when disconnected from the mothership.

I’ve been doing my best to set the services that I use sparsely to manual and then start them only when I am ready to use them to minimize boot time. But today I found another option. While I was in services manager starting up a local SQL instance I noticed an unfamiliar value in the Startup Type column: Automatic (Delayed Start).

Automatic (Delayed Start) Service Startup Type

Delayed Start? What's that all about?

After some quick research, I discovered that this new startup option was introduced in Vista to expedite the boot sequence by de-prioritizing services that need to be launched at startup, but for which there is no hurry to get them spun up.

The gist of it is this: Services with this setting will be launched at the end of the start-up process and the initial thread is given a priority of THREAD_PRIORITY_LOWEST to avoid sacrificing UI responsiveness during the start-up sequence just to get things like “Google Updater” running immediately.

Some candidates for delayed start-up immediately come to mind:

  • Local development  instances of Database or Web Servers.
  • Updaters: Windows Update, Google, Windows Search, Any type of indexer.
  • Any of the crapware  from Apple or Adobe that they insist are so important they must run at all times.

Maybe I’m late to the party discovering this feature, but like many companies mine completely ignored Vista and are just now getting around to Windows 7  and in the process discovering a lot of nice “new” features that may have technically been around a while.

I’m not proud. I’m willing to admit my extended ignorance of this feature if it can benefit another developer out there.

Did you guys know about this? Anyone know of other nice goodies in Windows 7/Vista  that are especially handy for tweaking development machines?

Let me know in the comments!

Shortcuts for registering COM objects or NET assemblies

TipsHere are a few shortcuts to simplify working with assemblies (either COM or .NET) that I have found especially useful over my years as a developer.

 

 

 

 

 

Tip 1: Associate DLL and OCX file extensions with RegSvr32

If you are still working with COM style objects regularly, associating the DLL and/or OCX extensions with RegSvr32.exe (in the Windows\system32 directory) allows you to register or re-register them quickly by double-clicking the file in explorer.

Associate RegSvr32 with DLL Extension - Step 1

Step 1: Right-Click a DLL file and choose the "Open With..." command.

 

BrowseFileExtRegSvr32

Step 2: Browse to WindowsWin32Regsvr32.exe

 

For those of you who work with managed assemblies more often, but still do a lot of interop work, an alternative is to associate DLL files to Regasm.exe or Regsvcs.exe

 

Tip 2: Use Windows “Send To..” folder for quick access to unregister a DLL

So now you can register components easily, what about un-registering them? That’s a snap too.

Just create a shortcut in the C:\Documents and Settings\{profilename}\SendTo directory with the target attribute set to %windir%\system32\REGSVR32.EXE /u and name the shortcut “Unregister DLL”

Shortcut to Unregister a DLL

Here it is in action

 

Windows Console Tricks and Shortcuts

Over the past few days I have been doing a higher than usual amount of work using the Windows console interface. I’ve been at this computing thing for quite some time, easily long enough to be old friends with the venerable DOS style command line, but not so much that I have yet figured out its cryptic emerald emoticon C:\>.

Today, my task involved a considerable amount of copying and pasting to and from the command line. As I am sure you are also painfully aware, the keyboard shortcut Ctrl-v (paste) don’t work at the command line unless your are looking for a shorthand way to enter the 2 character combination “^V” with only the press of two keys. For the first 20 times or so, I simply resorted to my standard approach for cut/paste operations. That is, using the menu from the application icon in the top left corner of the console window as shown here.

Tried and True

Tried and True

This approach is incredibly slow, but I don’t usually do very many clipboard operations at a command line, so I have always just sucked it up and lived with it. Today, however, it was getting extremely tedious and frustrating.

Fed up, I quested a way to use shortcut keys in the console interface for accessing the clipboard. I am sorry to report that I didn’t find it, but did find something almost as good.

Mousing the C prompt? Who Knew?

I don’t know how I didn’t discover this sooner, but you can right-click anywhere on the console window and get a shortcut menu. I suppose the concept of using a mouse at a command line just didn’t occur to me. In any event, using this technique is definitely quicker that my old method because it requires less menu navigation to get to the useful commands.

The copy/cut commands are still a little nonstandard and clumsy given that you must go into “mark” mode before you select the text. Also, marking multiline information is a bit quirky.

Right Clicking in the Console Window Opens a Shortcut Menu

Right Clicking in the Console Window Opens a Shortcut Menu

The “Find” command on this menu is another nice little feature that I had overlooked before. It has two interesting quirks

  • After an unsuccessful find, it emits an extremely loud beep from the built in computer speaker.
  • After a successful find it turns on “Mark” mode to prepare for a copy.

Command History

Most are familiar with using the up/down arrow keys to scroll through previous commands, but how many of you knew you could get a menu of recently used commands by pressing F7? You can reset this history by hitting Alt-F7.

Pressing F7 Opens a Menu of Recently Used Commands

Pressing F7 Opens a Menu of Recently Used Commands

Here are a few more useful keyboard shortcuts that are available in the windows command shell:

  • Up/Down Arrow Keys: Iterate back/forward through command history
  • F1: Paste the previous command one character at a time.
  • F2 (then enter a character):Pastes the previous last command up to, but not including, the first occurance of the provided character.
  • F3: Pastes the previous command in its entirety.

Use Wild cards with the CD command

You are probably used to using wild cards with delete and copy commands, but don’t think about using them to speed up directory navigation.  It can be a real time saver to type “cd doc*” instead of “CD documents and settings.”

You can even use this techniqe when navigating down more than one level, for example “CD win*\system3*” to replace “CD Windows\System32.”

Note:You can’t use a wildcard as the first character with CD for some reason.

Am I the only one who didn’t know about this?

I started to wonder if I was just dense and had missed this feature that everyone in the world knew about except me. I did a quick poll of my peers and discovered that about half of them knew about it, and the other half didn’t know about using my old technique. Out of curiosity, I’d like to see what my readers knew before reading this article.

Do you have any cool command-line tricks to add?

Put them in the comments and I’ll add the good ones to this post.

Further Reading