Last time in this series, I showed how a common and easy-to-use method of Entity Framework would result in more columns than necessary being returned from the database, and before that, I described the cause and solution for Overly Chatty Queries. At the end of the last post, I said that yes, there is such a thing as an “Underly” chatty query, and I suspect that statement might have sounded odd to some. After all, the first post was all about how bad it is to have a chatty query because issuing new connections and queries to the database is expensive. However, it really is possible not to be chatty enough, for at least two reasons that I can think of.
Let’s look at a variation on the snippet from last time:
As I already covered last time, this query is returning too many columns. However, let’s pretend for this instance that you really do need the entire Order and Customer objects, so restricting the columns returned isn’t an option. This query is still returning too much data! See, the results you see in C# make it look like a nice object hierarchy with no redundancy. If you run that query in LINQPad, this is a part of the visualization you’ll get (Extra customer columns omitted to save horizontal space; pretend they are all there)
So, LINQ to Entities makes it look like you selected 13 distinct Customers and their child Orders, because that’s how it is represented by the objects that Entity Framework materializes. However, if you take the generated SQL and run it, this is what you will see (extra customer columns again omitted to save space):
The customer row is repeated for every customer order! Accessing the data in this way consumes valuable network bandwidth, which means that it will take fewer users to saturate your network, and that is a bad thing. Fortunately, there is an easy way to load related objects for the low, low price of one additional query per entity type.
To repent of the sin of an Insufficiently Chatty Query, break that single table above into two: one for Customers, and another for Orders, without including any redundant columns except for the key column(s) that relate the entity types. Here’s an example of how to accomplish this:
This approach generates tight, efficient queries and datasets. The secret sauce is in the ToLookup method. This method groups a collection by the key selector and returns a Dictionary-like object whose Key is the key returned by the key selector and whose Value is a collection of items with that key value. Unlike a Dictionary, however, it will return an empty collection instead of throwing an exception if you access a key that does not exist. That’s why it’s safe to iterate through the parent collection and perform the lookup without checking if the key exists first.
So that’s one way a query can be insufficiently chatty: by trying to do too much in a single query, too much extra data can be transmitted between database and middle tier. There is another way that queries can be insufficiently chatty, but this is a little more subjective than the easily demonstrated case above. The straight truth is that Entity Framework writes SQL like a junior developer. The SQL it generates is declaratively an accurate representation of the LINQ that it translated, but it is often sub-optimal. With simple LINQ statements, there isn’t much that can go wrong, and this is where Entity Framework is at its best: making simple things simpler. However, as a LINQ statement gets more complex, the resulting SQL does too, and the lack of optimization really starts to show. For a particularly egregious example, see the SQL generated by a model with Table-Per-Type Inheritance. If you must implement all your data access in LINQ to Entities, I would advise you to try to keep the data access queries as simple as possible, focusing on getting the data that you need. Then, do the complicated stuff in LINQ to Objects once the data is in memory. Even better would be to write the SQL yourself as a stored procedure or view, but that is not always possible, either because of project requirements or because the team doesn’t possess the skills.
To sum up this lesson, there are two major points:
- Be mindful of the fact that while materialized objects appear as a graph with no redundancies, SQL databases return two-dimensional result sets; avoid returning more data than you need, including redundant columns from parent tables.
- Be wary of the SQL Entity Framework generates. Watch it using SQL Server Profiler, log it with ToTraceString(), or run LINQ statements in a tool like LINQPad to see what is actually being generated and run on the database. Don’t be afraid to resort to hand-tuning either by simplifying the data access portion of the LINQ or moving complex queries into views or stored procedures as needed.
Now, go forth and chat it up with the database! Just don’t overdo it.
Latest posts by Adam Anderson (see all)
- Azure HDInsight – What the Heck Is a Hadoop? - May 21, 2015
- What Is the Best Windows Git Client? - May 15, 2015
- I Love QueryableExtensions - April 30, 2015
- Make a Kendo MVVM calculated field depend on DataSource data - April 12, 2015
- Azure DocumentDB Overview - March 25, 2015