navigation
 Friday, January 27, 2006

Being able to compare the results of two different SQL queries and verify that they match 100% is a useful ability for any database application developer. I've encountered two main reasons for wanting to compare query results in the course of development:

  1. After importing data from an older system, verifying that equivalent reports in the old system and the new one agree
  2. When refactoring a complex query, to verify that the changes made haven't altered the output

In both cases, I find it convenient to run data comparisons of queries run in SQL Server Query Analyzer by using MS Excel. The technique is simple, but the payoff is huge: 100% confidence that every row and every column matches. Here are the steps I use:

  1. Create a new 3-page workbook and rename the tabs "Old", "New", and "Compare" 
  2. Hand-enter column names in the top row of each worksheet. I like to make them bold and underlined so Excel can tell that they're headers. 
     
  3. Run the "old" query in Query Analyzer. Once the results come back, click in the result pane and select all using Ctrl+A or Edit | Select All
  4. In the "Old" worksheet, select the leftmost cell in the row under the header and paste the results.
  5. Repeat the same steps for the "New" query and worksheet.
  6. If the queries don't explicitly set their own sort order, I recommend sorting both the "Old" and "New" data in a way that should guarantee that they are ordered the same on the two worksheets.
  7. In the "Compare" worksheet, enter the following formula in the leftmost cell under the header row: =Old!A2=New!A2. This formula will return TRUE if the cells match, and FALSE if they don't. 
  8. To make mismatches easier to spot, I like to apply some conditional formatting as well.

    1. Select the cell and then select Format | Conditional Formatting...
    2. Set up the condition so that it reads "Cell Value Is" "equal to" "FALSE"
    3. Click the Format... button. On the Patterns tab, select a nice bright color that will stand out, like red. Click OK.
    4. Click OK on the Conditional Formatting dialog
  9. Select the span from this initial cell to the rightmost column in the result set and press Ctrl+R or Edit | Fill | Right.
  10. Check the "Old" and "New" worksheets and note the number of the last row containing query results.
  11. In the "Compare" worksheet, select all the cells that contain data in the same row and column on the other two sheets and press Ctrl+D or Edit | Fill | Down.
  12. Differences in data will result in a value of FALSE (optionally highlighted if you used conditional formatting). The values can be scanned for visually, or you can use the Find dialog to search for them. When using the Find dialog, make sure to click the Options button and change "Look in" from "Formulas" to "Values" 
  13. Once differences are found, the old and new data can be examined to determine the cause for the difference.