navigation
 Friday, January 20, 2006

I love working with MS SQL Server. Oh sure, Oracle is supposedly faster and all that jazz, but when it comes to rapid development, MSSQL's ease of use just can't be beat. However, there are still some things to watch out for. Today I want to warn you all off of using UNION.

Conceptually, there's nothing wrong with UNION. It takes two result sets with congruent columns and returns them as a single result set. There are two potential performance issues, however, one of which can be quite serious.

The first problem is that the UNION operator automatically performs a DISTINCT operation on the combined result set. If it's unnecessary, or worse, if it is necessary, but you've also used SELECT DISTINCT in one or both of the UNIONed queries, then UNION is causing a loss of performance for nothing. The workaround to this is simple; use UNION ALL instead.

The second problem can be far worse. I've run into situations many times where the UNION (or UNION ALL) operator caused inexplicably poor performance. In one recent scenario, I used UNION ALL to merge the results of two queries, each of which ran in about 7 seconds. When merged using UNION ALL, query execution time leapt up to 1:20. Neither the estimated execution plan nor the actual one reflected this; the total subtree cost was exactly the same as the sum of the two queries' costs when run separately. The fix is a little more complex, but extremely effective.

My solution to the problem is to use a table variable. In general, I recommend the use of table variables over temp tables unless you absolutely need a feature that temp tables support but table variables do not.

Example

The example below is not intended to demonstrate good database design; it merely serves to illustrate the solution. Given the original query:

SELECT FirstName, LastName, Salary
FROM Employees
UNION ALL
SELECT FirstName, LastName, Salary
FROM Managers

First declare a table variable with columns that match that of the result set:

DECLARE @Result TABLE (
  FirstName varchar(50),
  LastName varchar(50),
  Salary money,
  -- Example of how to declare a PK within a table variable
  PRIMARY KEY ( LastName, FirstName )
)

Next, insert each query into the result table:

INSERT @Result
SELECT FirstName, LastName, Salary
FROM Employee

INSERT @Result
SELECT FirstName, LastName, Salary
FROM Manager

Finally, select the results:

SELECT *
FROM @Result

This workaround will not only avoid potential performance problems with UNION, but will actually allow you to improve join performance through the definition of a good primary key, if the UNION result is joined to other data.