asp:Feature
LANGUAGES: C#
ASP.NET VERSIONS: 3.5
LINQ Deep Dive
Part II: More Tips for Delving in to LINQ
By Zeeshan Hirani
We discussed in the first part of this series some
of the different behaviors of LINQ queries in different domains, such as LINQ to
Objects and LINQ to SQL (see Part I in the April issue of asp.netPRO). LINQ to SQL interprets the query differently and
sometimes returns different results as compared to when you execute the query
using LINQ to Objects. The behaviors of a LINQ query in different domains can
significantly impact performance, output, and memory consumption.
We discussed in Part I how to apply joins and
filter on a LINQ to Objects query in an efficient way to yield maximum
performance. We also learned how to delay load a particular column in a table
and fetch the column only when it is necessary so you don t incur the cost of
bringing a column with huge content, such as a binary or varchar(max) column.
We ll continue our exploration in this second installment by introducing users
to the caching feature offered by the object tracking service and examining on
which scenarios caching is used and how it ensures object reference
consistency. We ll also discuss how LINQ to SQL automatically maps results
returned to entities from dynamic SQL, as well as how a non-default constructor
can prevent composing queries from other queries. We ll also learn to
understand and identify if a query is being sent to a database for execution or
if it is being applied in memory.
Tip One: Query Based on a Primary Key Optimized for Database Roundtrips
LINQ to SQL has its own caching mechanism that it
uses to ensure optimistic concurrency. If you retrieve an object from the DataContext,
the LINQ to SQL object tracking service starts tracking the object. The
tracking service caches the object based on a primary key. The caching
mechanism is not actually there to improve performance of the query, but rather
to provide consistency with the original results you obtained; for instance, if
you retrieved a customer object from a DataContext and modified a few
properties on the object. This makes the customer object dirty in memory, and
marked for update. However, the update has not been sent to the database yet.
If you were to query for the same customer again on the same DataContext, you
should get the object you modified, not a fresh copy from the database (which
will not have your changes). If LINQ to SQL did not maintain a cached version
of the object, you could end up losing the changes if you queried for the
object again.
Not only does caching behavior offer consistency
with the results, but also you get consistency in terms of object references.
For example, when you retrieve the customer initially, the object reference
given by the DataContext would be the same no matter how many times you query
the DataContext for the same customer again. Getting the same object back
offers a clean programming model and helps avoid bugs introduced in the
application because of object comparisons that would not be valid if LINQ to
SQL did not give us back the same entity on which we originally worked.
Does that mean we do not incur the cost of a database
roundtrip if we query for an object that initially had been fetched from the
same DataContext? The answer is, it depends on how you are querying. If you
query based on any column other than the primary key, regardless of which LINQ
query operator you use, LINQ to SQL will make a database call. When it comes to
giving you back the object, LINQ to SQL will check in the object tracking
service to see if it is tracking the object that was returned from the query
executed based on the primary key. If it finds an object in the tracking
service that has the primary key as the one returned from the query sent to the
database, LINQ to SQL will give you the object from the tracking repository.
This means you ll still incur the database hit, regardless of whether LINQ to
SQL has fetched the object initially and is tracking it in its tracking
repository. However, if you query using a Single or First operator using a primary
key, LINQ to SQL will first check in its object tracking service to see if it
can find an object that has the same primary key on which you are searching. If
a match is found, LINQ to SQL will immediately return the object without
bothering to go to the database.
Figure 1 shows I used the Single operator to
retrieve the ALFKI customer based on the primary key column. The next time I
query for the ALFKI customer using the Single operator based on the primary
key, LINQ to SQL won t go to the database, but simply will return the object
from the tracking service. The First operator used with a primary key column
offers the same benefit; it, too, simply returns the object from the tracking
service. The last query in Figure 1 is what makes the second database call.
Despite the fact that we are using the Single operator, LINQ to SQL makes a
call to the database. Because the contact name of Maria Anders belongs to the customer
ID of ALFKI, and object services was tracking the ALFKI customer, you end up
with the same customer reference. This confirms our behavior that if you don t
query based on a primary key, LINQ to SQL will always make a call to the
database. On our output window we are printing the test to see if all four
queries return the same object reference; indeed, the result confirms that all
queries got back the same object.
Figure 1: Single and First operators do not go back
to the database if the object has been initially fetched
Tip Two: Order Lazy-loaded Child Collections on the Database
If you ve worked with other OR mappers on the
market, you must ve come across this issue several times. One of the
constraints I ve encountered in some OR mappers is that there is no clean way
to define how to sort child collections based on a certain column. For example,
if I have a customer instance in my hand and I want to get access to its orders,
I simply can navigate to the Orders property of the customer. But what if I
want those orders to be sorted by ShipCity? In LINQ, you can apply the Order By
operator on the Orders collection for the customer. But does that Order By
operation get executed on SQL Server? No. As soon as we access the Orders
property of the customer, LINQ to SQL fetches all the orders for the customer in
memory. From there on, any operations you perform will get executed in memory.
Figure 2 shows that orders for the ALFKI customer
got sorted based on ShipCity. However, looking at the SQL capture, it s
apparent that the order operation was done in memory because our SQL statement
does not include any Order By operations. By accessing the Orders collection in
this fashion, you don t get a chance to give any hints to LINQ to SQL that you
want the orders to be sorted by the ship city on SQL Server.
Figure 2: An Order By operator on the Orders
collection gets applied in memory and is not very efficient
Filtering child collections in LINQ to SQL can be
accomplished by using the AssociateWith method of the DataLoadOptions class.
The AssociateWith method takes in a lambda statement that not only can define
how the child collection needs to be filtered, but also specify how the child
collection needs to be sorted. Figure 3 uses the AssociateWith method to sort
the orders for a given customer by ShipCity. To confirm the sort, ShipCity was
applied at the database level; the SQL capture is also recorded, which includes
the Order By clause for ShipCity.
Figure 3: Confirms the sort order gets executed by
the database
The AssociateWith method is only available in LINQ
to SQL. If you are going to be using Entity Framework, you can use the CreateSourceQuery
method available on Entity Collection and Entity Reference. CreateSourceQuery
returns an ObjectQuery, which gets converted to SQL and is sent to the
database. If you want your child collections to be loaded differently, such as
sorted in a different order or apply a filter, you must access the ObjectQuery
instance that is responsible for loading the child subcollection, then modify
the query to add ordering before ObjectQuery gets executed. Figure 4 shows an
example of applying to a child collection a sort operation using Entity
Framework.
Figure 4: Applying a sort on the Orders collection
for the ALFKI customer using Entity Framework
Tip Three: Avoid Non-default Constructors in LINQ to SQL
Queries
You must use an object initializer if you are going
to be composing queries from other queries and want the entire query to be
executed on the database. LINQ to SQL cannot convert non-default constructors
into SQL that SQL Server can understand. Therefore, it is preferred that you
use non-default constructors to ensure that the entire query is sent to the
database and no part of the query is performed in memory. If the query uses non-default
constructors in projection, when you try to apply further composition such as
adding a where clause, you ll get a runtime error saying the translation is not
supported.
Figure 5 shows I joined products, categories, and
suppliers and projected the output to the ProductSummary class. Notice I used a
non-default constructor of the ProductSummary class that takes CategoryID and
SupplierID as the constructor parameters. Further, I filtered the projected
results by CategoryId of 1. Upon iterating the final query, we get an exception
saying the query is not supported. The reason, as we discussed earlier, is the
use of a non-default constructor, which cannot be translated in SQL syntax by LINQ
to SQL.
Figure 5: Using non-default constructors raises an exception
if you try to further compose queries from the original query
What if you get an API that has no default
constructor and you are forced to use the class with its non-default
constructor? In that case, you can force the rest of the query to be executed
in memory by using the AsEnumerable operator available in LINQ. As shown in Figure
6, I m using the AsEnumerable operator before I apply the CategoryId filter.
This hint tells the compiler that any query operation performed after
AsEnumerable must be done in memory, thus avoiding the non-default constructor
exception.
Figure 6: Using the AsEnumerable operation ensures
we don t get an exception in our LINQ query
If you want your entire query to be sent to SQL Server,
including your category filter, you must avoid using a non-default constructor.
Figure 7 shows the correct version of the query, which, in its entirety, gets
executed by SQL Server (as confirmed by the SQL capture). Notice we are not
calling any constructor that defaults to using a default constructor with no
parameters.
Figure 7: Entire query gets executed in SQL Server
because we are using a default constructor
Tip Four: Eager Load a Self-referencing Table Using LINQ to SQL
There are many places where you need to represent
your table records in a hierarchy. A real-world example would be where an
employee has a manager. Both the employee and the manager are stored in the
same table. However, to find out the manager for an employee, you would have an
extra column, like ManagerId, that represents the ID of the employee to whom a particular
employee reports.
Another example would be an e-commerce database
with categories that each have subcategories. To identify the parent for a
category, you would have a column, like ParentCategoryId, which represents the parent
category for a category. If ParentCategoryId is null, then that category is
considered the top-level category. If you have a situation where you need to
load a category and all its subcategories any level deep in one single database
call, there is no way to get that working in LINQ in a clean fashion. You
cannot use DataLoadOptions to eagerly load a self-referencing table. If you
were to try, you would get an exception saying cycles not allowed.
Figure 8 shows the database diagram for the
Category and Products tables. Notice I have a category relationship pointing
back to itself, which merely describes that each category has subcategories
within it. The Category table also has a one-to-many relationship with Products
because products belong to a particular category. In Figure 8, we retrieve top-level
categories by applying a filter where the parent category instance is null. The
C# syntax of comparing objects to null references is translated by LINQ to SQL
to a null comparison of a foreign key column.
Figure 8: Eagerly loading a self-referencing table
raises an exception because LINQ to SQL does not allow loading cycles
To eagerly load the first-level subcategories of a
category, I used DataLoadOptions and passed in the lambda expression to load
the Categories navigation property. I got an exception saying LINQ to SQL does
not support loading cycles when I execute the LINQ query. What it essentially
boils down to is you cannot eagerly load a self-referencing table. From a
functionality point of view, I can understand why LINQ to SQL cannot
arbitrarily load any level-deep hierarchy. But most of the time, in real-world
scenarios, we are aware of the depth we d like to go, and it would be nice to
allow recursive queries with a depth level specified.
Because LINQ to SQL has limitations where it won t
allow you to eagerly load a self-referencing table, developers are required to
manually load recursive queries by explicitly traveling each tree of the
category. Figure 9 shows a large query for eagerly loading two-level-deep subcategories
for each parent category. I obtained my top-level category by only retrieving
categories that don t have a parent. To load children for the category, I accessed
the navigation property, Categories, for each category. For instance, I obtain
my first-level subcategories by cat0.Categories, second level by
cat1.SubCategories, and so on. This kind of query can potentially get complex
if you have a deep hierarchy. The query produces one large SQL statement that
may be very expensive to execute, so use it with caution and use it only when
you want to eagerly load child entities for a self-referencing table.
Figure 9: Manually traveling two levels deep for
each parent category
Entity Framework supports the Include method,
which, when passed in QueryPath allows eager loading to a predefined depth. For
instance, in Figure 10, to load two-level-deep categories, the Include method
was called on Categories with a QueryPath of Categories.Categories. QueryPath
is comprised of navigation properties; the number of times the navigation
properties are traversed indicates the desired depth.
Figure 10: Using Include to specify the depth in
our LINQ to SQL queries
Tip Five: Use AsQueryable with LINQ to Objects and LINQ to SQL
The AsQueryable method allows a query to be
converted to an instance of IQueryable. When you use the AsQueryable operator
on an existing query and apply further transformations, such as applying a
filter or a sort order, those lambda statements are converted to expression
trees. Depending on the provider you are using, expression trees will be
converted to the domain-specific syntax, then executed. In the case of LINQ to
SQL, the expression tree would be converted to SQL and executed on SQL Server.
However, if you use the AsQueryable operator on a query that does not implement
IQueryable<T> and only implements IEnumerable<T>, then any
transformations you apply on the query will automatically fall back on the IEnumerable<T>
specification. What this means is that by tagging a query with AsQueryable, you
get the benefits of both LINQ to SQL and LINQ to Objects implementation. If
your existing query happens to implement IQueryable, the query is converted to
SQL by LINQ to SQL; otherwise, the query is executed in memory in the form of IL
code.
This offers excellent benefits in real-world
scenarios where you have certain methods on an entity that returns an
IQueryable of T and some methods return List<T>. But then you have a
business rule filter that must be applied on the entire collection, regardless of
whether the collection is returned as IQueryable of T or IEnumerable of T. From
a performance standpoint, you really want to leverage executing the business
filter on the database if the collection implements IQueryable; otherwise, fall
back to apply the business filter in memory using the LINQ to Objects
implementation of delegates.
Figure 11 shows the TopSellingProducts method,
which returns an IQueryable of products. I defined that all the products that have
been ordered more than 50 times are considered top products. Because the
TopSellingProducts method returns an IQueryable, I have not executed the query;
I simply declared the query of how to get top-selling products.
Figure 11: Using AsQueryable to apply a filter
either in memory or a database
Another method, MostProfitableProducts, returns a list
of products that have a UnitPrice greater than 60. I arbitrarily came up with
this rule and stated that all products that have a UnitPrice of more than 60
dollars are considered profitable to the company.
Looking further in Figure 11, the DisplayProducts method
is responsible for displaying products. One of the business rules states that
we can only display products that have either quantity greater than 0 or the
product is not a discontinued product. To apply our business filter, I sent the
results of both TopSellingProducts and MostProfitableProducts to our filter
method, FilterNonDisplayableProducts. FilterNonDisplayableProducts simply
removes the products that violate our business rules. Notice that FilterNonDisplayableProducts
takes in an IEnumerable because this is the lowest common denominator with
which we can work and is implemented by all generic collections in one form or
another. Before applying the business filter I convert the IEnumerable of T
collection passed in as a parameter to IQueryable of T using the AsQueryable
operator. We do this because we want the filter to be applied on the database
for collections that implement IQueryable. If the collection does not implement
IQueryable, the filter is applied in memory. This way you can apply the
business rule to both IQueryable and IEnumerable and get the benefits of both
worlds.
Figure 12 shows the SQL capture when I execute the
code from Figure 11. Based on the SQL query sent, we can confirm that our
business filter was applied on the database for top-selling products because it
was using IQueryable, whereas, for the MostProfitableProducts collection, the
business filter was applied in memory because it implemented only IEnumerable
of T.
Figure 12: Confirms that the business filter gets
executed on the database for top-selling products
Tip Six: Use ExecuteQuery to Execute Dynamic SQL in LINQ to
SQL
If you find a particular query is difficult to
represent as a LINQ query, you can consider other options, such as writing your
own SQL statements inside a stored procedure or a function, then calling the
stored procedure and function using LINQ to SQL. You also can write dynamic SQL
statements using the helper methods provided in DataContext. Some of you may be
thinking this is new or different from ADO.NET, which allowed executing dynamic
SQL and getting a datareader or dataset back. You can achieve the same results
by making use of the ExecuteQuery<T> method available on the DataContext.
The ExecuteQuery method does a lot more than simply
executing dynamic SQL. First, it supports the concept of parameterized queries.
You provide parameters like you provide parameters to string.format using
placeholders, which LINQ to SQL converts to a parameterized query. When you
execute dynamic SQL in traditional ADO.NET, you either get a datareader or
dataset. It always had been left to the developer to convert the datareader to a
customized business object. However, ExecuteQuery takes a generic type that tells
LINQ to SQL that a particular SQL statement would return a strongly typed
instance of the class passed in as a generic type. You no longer need to worry
about converting a datareader to an object.
There are some restrictions you must follow to
accomplish proper mapping of your columns to properties on your object. First,
your query columns must contain the columns that are defined as the primary
key. Second, the column names returned by the query should match the property
names defined on your entity class. If the column names do not match, you can
use an alias in T-SQL to match the property name defined on your entity.
Furthermore, LINQ to SQL does not require that you return all the columns that
map to all the properties on your entity. If your query is missing some column,
LINQ to SQL will assign a default value to your property. Beware that because
you are loading an entity from the database, an object tracking service will
start tracking the object you just obtained. And, if you ve partially populated
your object, the next time you try to fetch the same object from the DataContext,
you ll end up with a partially filled object unless you refresh the object from
the database or use a fresh DataContext.
Figure 13 shows a dynamic SQL statement that returns
all the customers who placed at least one order that was shipped in the city of
Seattle. I used placeholder syntax to pass parameters to my SQL query. The
placeholders get converted to parameterized queries, which will free the
developer from worrying about a SQL injection attack. I then pass my SQL
statement to the ExecuteQuery method on the DataContext, with a generic type of
Customer to return a collection of customers. Because my query only returns
CustomerID and CompanyName, I m partially populating my customer object so any
fields that I m not populating get assigned a default value. If my query did
not return CustomerID, then LINQ to SQL would throw a runtime exception
complaining that the query must at least have a CustomerID column, as it is the
primary column that LINQ to SQL uses to track the object.
Figure 13: Executing dynamic SQL to return a strongly
typed customer object
Tip Seven: Execute Your LINQ Query on the Database
Depending on the approach you take when writing a
LINQ query, you may be surprised that some of the operations are getting
performed in memory instead of the call being translated to SQL and executed on
the database. When you apply aggregate operators on association relationships
on an entity in the context of query syntax, the aggregation is performed on
the database. However, when executed outside the query syntax, the same syntax
would force the entire association relationship to be brought from the database
and the aggregate operation performed in memory.
In Figure 14 shows two versions of the same query
that returns the customer ID, total number of orders a customer has placed, and
total amount they ve spent. In the first query I used the Orders association
relationship available on each customer to get the total orders placed. To get
the total amount spent, I first used the association relationship Orders on the
customer, then applied the SelectMany operator to get a flattened list of all
the order details for all the orders for a customer.
Figure 14: Aggregate operators are applied on the
database if used inside a LINQ query
After obtaining all the order details, I used the
sum operator to calculate the total spent by each customer. Because I used an
association relationship inside a LINQ query, the entire query is converted to
SQL to send to the database for execution.
The second query in Figure 14 uses the same
association relationship on the customer entity to perform calculations.
Because the calculations are not part of an existing query, LINQ to SQL must
bring all the orders for the customer and for each order bring down all its
OrderDetails to calculate the Sum and Count operations. Bringing all the orders
and order details for a customer is an expensive operation that need not be
performed if all you want to do is get the count and sum. Those operations
easily can be done on the database. As a developer, it is important to
understand the trade-offs and know which option may be better suited for your
scenario. For instance, if you have the Orders and OrderDetails for a customer
in memory, it may be more efficient to perform these operations in memory
instead of making a database call to the server.
Conclusion
In this article we explored how LINQ uses a
tracking service to ensure the same object on which the user worked is being
returned. When searching for an object based on the primary key, LINQ to SQL
saves the database roundtrip if it has initially fetched the object and is
available in the tracking repository. If the query searches on any columns
other than the primary key, the LINQ to SQL engine will make a database
roundtrip, but will then check the tracking service to see if the object
retrieved from a query is being tracked. If the object is available in the
tracking service, it will return that object. These behaviors ensure that
consecutive retrievals will not wipe out the changes you made on an existing
object.
We also discussed using the AssociateWith operator
to apply sorting on the database for child entities. We went through the
limitations of loading self-referencing tables using LINQ to SQL. We also saw
an easier syntax available in Entity Framework that allows us to navigate the
depth of the child collection using the Include method. We then discussed how
using the AsQueryable operator lets you leverage existing IQueryable queries to
perform further filtering on the database, and if the collection only
implements IEnumerable, query processing falls back to the IEnumerable
implementation of LINQ to Objects.
We also saw a few examples of using ExecuteQuery to
run our dynamic SQL statements and get objects back. ExecuteQuery is better
than its predecessors because its uses placeholders that get converted to
parameterized queries, saving you from having to worry about SQL injection
attacks. Using ExecuteQuery you no longer have to convert readers to entities,
and ExecuteQuery does not require you to fully populate your entire entity.
Finally, we discussed how aggregate operators are applied on an association
relationship of an entity. Aggregate operators used in the context of an
existing query will be applied on the database. Outside the context of
IQueryable, the aggregate operators force the entire collection to be brought
on the client and calculations performed in memory.
Source code accompanying this article is available for download.
Zeeshan Hirani (zeeshanjhirani@gmail.com)
is a senior developer at CheaperThanDirt.com.
He specializes mainly in ASP.NET, AJAX, and leveraging OR-mapping solutions
like LINQ and Entity Framework in business applications.