I think everyone will agree that code reuse is a good thing. In T-SQL, one way to write reusable code is with user-defined functions (UDFs). UDFs come in three flavors:
All three of these have their uses, but today's topic is all about a common mistake people make when they write UDFs, specifically scalar UDFs. Let's look at a simple example. Consider that you have a typical customer orders database, and you've decided to encapsulate the logic of computing a the sum of a customer's orders. Of course, in real life, this is a trivial thing probably not worth abstracting to its own dedicated UDF, but let's pretend that it's something really clever and significant. A scalar UDF would look something like this:
CREATE FUNCTION dbo.Customer_Get_OrderTotal( @CustomerID int ) RETURNS money AS BEGIN declare @result money select @result = sum( Quantity * Price ) from Orders where CustomerID = @CustomerID RETURN @result END
And it would be used like this:
select Name, dbo.Customer_Get_OrderTotal( CustomerID ) as OrderTotal from Customers
Can you see the performance problem here? The problem isn't just that a scalar function is being used; it's that the scalar function performs its own table access once for each row in the calling statement. If there are 100 customers in the database, that will be 100 select statements executed on the Orders table; if there are 100,000 customers, that's 100,000 select statements on Orders! This is a very inefficient way to access data, and it will show in the query's performance as the Customer table grows. As a general rule, scalar functions should not access other tables. Now, let's see how to rewrite the above function as a table-valued function that will access the Orders table only once, regardless of the number of rows in Customers.
CREATE FUNCTION dbo.Customer_Get_OrderTotals() RETURNS @result ( CustomerID int, OrderTotal money, primary key ( CustomerID ) ) AS BEGIN insert @result select CustomerID, sum( Quantity * Price ) from Orders group by CustomerID RETURN END
Here, OrderTotal is computed exactly the same way as before, but now instead of taking CustomerID as a parameter as before, the function computes the order total for every customer and returns the result as a table. This function would get used like this:
select c.Name, ot.OrderTotal from Customer c join cbo.Customer_Get_OrderTotals() ot on ot.CustomerID = c.CustomerID
In this scenario, there is only one table access done on the Orders table regardless of how many rows are in the Customer table. But wait, there's a performance problem with this solution, too. What if you only wanted to select a single row with this query, like this:
select c.Name, ot.OrderTotal from Customer c join cbo.Customer_Get_OrderTotals() ot on ot.CustomerID = c.CustomerID where c.CustomerID = @CustomerID
Now we're in a pickle! The table-valued UDF is going to compute the order total for each customer, even though in the end only one of those totals will be used. Thankfully, there is a solution that will not involve trying to predict how many Customer rows will be needed and then deciding on the fly whether to use a scalar or table-valued function. Inline functions to the rescue!
Both table-valued functions and inline functions return tables as results. The difference is that table-valued functions are allowed to contain multiple statements, while inline functions are only allowed to contain a single statement returning a result table. In return, what you get is SQL Server treating the inline function as if it's part of the calling statement and only evaluating the rows needed by the calling statement. You can think of it as if SQL Server replaces the inline function call with the body of the function and evaluates the resulting statement; thus the name "inline" function. Due to the restrictions on inline functions, you won't always be able to rewrite table-valued functions as inline ones, and you'll have to resort to passing parameters that the table-valued function can use to restrict the rows returned. Fortunately for us, this function is a natural fit for inline functions. This is how it would be written:
CREATE FUNCTION dbo.Customer_Get_OrderTotals() RETURNS TABLE AS RETURN select CustomerID, sum( Quantity * Price ) from Orders group by CustomerID
It would be called exactly the same way as before. The difference is that now, if only a single customer row is selected, the function will only evaluate for the since CustomerID, but if the entire customer table is selected, the function will evaluate all of the necessary CustomerIDs in a single table access.
In closing:
Remember Me
a@href@title, i, strike, u
Copyright © 2003-2008 Falafel Software Inc.
Subscribe to Falafel Blogs
The opinions expressed herein are Falafel's employees own personal opinions and do not represent Falafel Software's view in any way in case they go bananas!