navigation
 Thursday, October 04, 2007

I see this pattern all the time: you need aggregated data from one table, but from multiple different ranges. A very common example of this is a table that holds some kind of date-related data, and a report that displays current month and year to date. For the purposes of this article, let's create a simple, denormalized table that holds accounting data:

create table Account (
  AccountNum int
)

create table AccountActivity (
  AccountNum int,
  Date datetime,
  Amount money
)

To generate the kind of results I described above, I often see queries that look something like this:

create procedure ActivityReport
  @MonthStart datetime -- Assume the day part is 1 for simplicity
as
select AccountNum,
  isnull( ca.Amount, 0 ) as AmountCurrent,
  isnull( ya.Amount, 0 ) as AmountYTD
from Account a
left join (
  select AccountNum, sum( Amount ) as Amount
  from AccountActivity
  where Date >= @MonthStart
  and Date < dateadd( m, 1, @MonthStart )
  group by AccountNum
) ca on a.AccountNum = ca.AccountNum
left join (
  select AccountNum, sum( Amount ) as Amount
  from AccountActivity
  where Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart )
  and Date < dateadd( m, 1, @MonthStart )
  group by AccountNum
) ya on a.AccountNum = ya.AccountNum

In plain English, there is a subquery for each date range. This results in a table access for each subquery, which is redundant and slow. Every query that follows this pattern can be rewritten by moving the date ranges in the WHERE clauses to CASE expressions in a single SELECT that encompasses the entire date range. In the above example, the earliest date desired is the beginning of year, and the latest date desired is the first of the month after @MonthStart, so the new replacement subquery will encompass that entire range:

select *
from AccountActivity
where Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart )
and Date < dateadd( m, 1, @MonthStart )

This query will return all the data necessary to compute the current month's activity as well as the year to date activity, but since it returns all the data in a single derived table, the data will have to be filtered in order to produce the desired results. This can be accomplished by using CASE expressions in the SELECT clause, like this (important parts are in italic):

create procedure ActivityReport
  @MonthStart datetime -- Assume the day part is 1 for simplicity
as
select AccountNum,
  sum( isnull(
    case when aa.Date >= @MonthStart
         and aa.Date < dateadd( m, 1, @MonthStart )
         then aa.Amount
         else 0 end, 0 ) ) as AmountCurrent,
  sum( isnull(
    case when aa.Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart)
         and aa.Date < dateadd( m, 1, @MonthStart )
         then aa.Amount
         else 0 end, 0 ) ) as AmountYTD
from Account a
left join AccountActivity aa on a.AccountNum = aa.AccountNum
where Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart )
and Date < dateadd( m, 1, @MonthStart )
group by a.AccountNum

Notice how the conditions that were in the WHERE clauses above move directly into the CASE expressions. It's ugly, but it will also run much faster thanks to reduced table access. In many cases, you can reduce the redundancy and complexity of the CASE expressions by taking the WHERE conditions into consideration as well. The simplest example of this is the CASE expression for the YTDAmount; it is 100% redundant with the conditions of the WHERE clause, so the query can be simplified by removing the redundant expressions:

create procedure ActivityReport
  @MonthStart datetime -- Assume the day part is 1 for simplicity
as
select AccountNum,
  sum( isnull(
    case when aa.Date >= @MonthStart
         and aa.Date < dateadd( m, 1, @MonthStart )
         then aa.Amount
         else 0 end, 0 ) ) as AmountCurrent,
  sum( isnull( aa.Amount, 0 ) ) as AmountYTD
from Account a
left join AccountActivity aa on a.AccountNum = aa.AccountNum
where Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart )
and Date < dateadd( m, 1, @MonthStart )
group by a.AccountNum

The other CASE statement has one condition redundant with the WHERE clause: the upper bound 

dateadd( m, 1, @MonthStart )

So it can be removed as well:

create procedure ActivityReport
  @MonthStart datetime -- Assume the day part is 1 for simplicity
as
select AccountNum,
  sum( isnull( case when aa.Date >= @MonthStart then aa.Amount else 0 end, 0 ) ) as AmountCurrent,
  sum( isnull( aa.Amount, 0 ) ) as AmountYTD
from Account a
left join AccountActivity aa on a.AccountNum = aa.AccountNum
where Date >= dateadd( m, 1 - month( @MonthStart ), @MonthStart )
and Date < dateadd( m, 1, @MonthStart )
group by a.AccountNum

Now, if you needed to add the amount of the prior month or next month to this query, all you would need is another CASE expression in the SELECT clause instead of a whole new subquery. This approach performs better right away, and also scales much better if more date ranges are added.

Wednesday, October 17, 2007 11:33:41 PM UTC
Excellent tip!
John Waters
Friday, November 16, 2007 1:51:29 PM UTC
thanks a lot for good SQL code on datetime()function..
Rakesh Singh
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, i, strike, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview