The Four Deadly Sins of LINQ Data Access: Part 4–Too Many Rows

By March 30, 2014News

Introduction

So far in this series, I’ve written about Overly Chatty QueriesToo Many Columns, and Insufficiently Chatty Queries. There is one more big one that I’ve left until last, because in my view, it’s the most obvious of all the sins. However, with all of the others out of the way, let’s take a moment to talk about the Sin… of Too Many Rows.

Confession

As a consultant, I probably see more than my fair share of poor-quality code; the projects that are struggling are the ones that need our help. Even considering that, I have seen code like this alarmingly often:

Do you see the problem here? Let’s have a look at the generated SQL:

Something missing? There is no WHERE clause! That means that the entire Customers table just got returned from the DB into memory and the filtering was done there. For a small table like Northwind Customers, this is not really a problem, but for large tables in production databases, it is a huge problem. Like all other sins, it can consume inordinate resources on the database server, on the network, and on the middle tier server.

Repentance

The way to avoid committing this sin is quite simple: don’t materialize the query until the absolute last moment before you actually need it and/or before the DB context is disposed. In the above example, it is as simple as either completely eliminating the call to ToList(), or at least moving it to the final statement:

Generated SQL:

Now the WHERE clause is being sent to the database and only matching rows are returned and materialized as objects in the middle tier. That’s really all there is to it when it comes to the code, but perhaps it’s worth taking a moment to explain what the difference is.

Indulgence

A quick aside: I already made one complaint about LINQ to Entities, which is that it really doesn’t generate very good SQL. Here’s another one pertinent to this sin: I think LINQ to Entities does too good of a job at hiding what it’s doing (until it doesn’t and you must become intimately familiar with the internals), making it too easy for naive developers to write wasteful queries without even realizing what’s going on. It all boils down to two important interfaces: IEnumerable and IQueryable.

When you write a data access query using LINQ (whether by keywords or by methods with lambda expressions), what is really happening is that you are calling the LINQ extension methods of the IQueryable interface. As long as the code doesn’t try to access the contents of the IQueryable either explicitly (e.g. calling a method like ToList()) or implicitly (e.g. by attempting to iterate through the results in a foreach loop), all of these method calls simply build up a query expression but don’t execute it. Only once the code actually tries to access the data does the query execute, and at this point the results of the query are mapped into a collection in-memory objects. This collection (typically a generic List) implements the IEnumerable interface, which has exactly the same LINQ extension methods as IQueryable. The difference is that these LINQ methods execute immediately on objects already in memory. Be aware that there is a difference and make every effort to defer execution of the queries until all row and column restrictions (via WHERE and SELECT operations) have been defined.

Summary

The Deadly Sins of LINQ Data Access all have a unifying underlying principle: Select only as much data as you actually need, in as few calls as possible. Don’t query one row at a time; don’t return a whole object when you only needed a handful of fields; don’t try to do too much with a single query, and in the name all that is good, don’t select too many rows and then filter them afterwards! Now, go forth and sin no more! (*At least in regards to querying databases with LINQ)

The following two tabs change content below.