asp:Feature
Understanding Parallel LINQ (PLINQ)
By Joydip Kanjilal
Parallel LINQ (PLINQ) is a concurrency execution
engine for executing Language-Integrated Query (LINQ) queries. PLINQ is actually
a part of the Parallel Extensions library (previously known as Parallel
Framework Extensions PFX), which is a managed concurrency library that
comprises two parts: Task Parallel Library (TPL) and PLINQ. The former is a
task parallelism component, and the latter is a concurrency execution engine
built on top of the CLR. This article takes a look at PLINQ and its features.
PLINQ Prerequisites
To work with PLINQ, you should have one of the
following installed in your system:
Visual Studio 2008 with the Parallel Extensions
Library
Visual Studio 2010 Beta 1 or later
Also, you should have a good understanding of LINQ
and how to use LINQ queries.
What Is PLINQ?
Simply put, PLINQ is a parallel execution engine
for executing your LINQ queries on multicore systems. The MSDN article, "Parallel
LINQ: Running Queries On Multi-Core Processors," states: "PLINQ
is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML
query and automatically utilizes multiple processors or cores for execution
when they are available."
PLINQ is a programming model that you can use to
build applications that can take advantage of parallel hardware for improved
performance and scalability without the need to go deep into the intrinsic
details of what data parallelism is and how it all works. The key to PLINQ is
parallel execution using multiple threads, which execute concurrently. Note
that a thread is the path of execution within a process and is also the
smallest unit of execution within a process. PLINQ is based on extension
methods and can be used to take advantage of multiple processors in your
system.
Parallelizing Your LINQ Queries
When you're writing your LINQ queries, to
parallelize those queries you should either reference the
System.Concurrency.dll assembly at compilation time or the
System.Linq.ParallelEnumerable.AsParallel extension method on your data.
Consider the following code:
var integerList = Enumerable.Range(1, 100);
var data = from
x in integerList.AsParallel()
where x
<= 25
select
x;
foreach (var v
in data)
{
Console.WriteLine(v);
}
Notice the usage of the AsParallel() statement.
This would return and object of type ParallelQuery<int>.
The AsParallel extension method is defined as shown
in the following example:
public static class System.Linq.ParallelEnumerable {
public static
IParallelEnumerable<T> AsParallel<T>(
this IEnumerable<T> source);
//Other Standard Query Operators
}
Note that the AsParallel method is overloaded and
can accept variable integer arguments and also a ParallelQueryOptions
enumeration as parameters. The first argument that is, the integer argument denotes
the degree of parallelism. The degree of parallelism is given by the number of
threads in use. The other parameter, ParallelQueryOptions, is an enumeration
that can have one of the two values: None and PreserveOrdering. The
PreserveOrdering value is used to preserve the order of the elements.
Under the Covers
Note that any PLINQ query that can be parallelized
is based on partitioning. What PLINQ does is breaks the input data into pieces
and then distributes it to the processing cores on your system. Partitioning is
of the following types:
range partitioning
chunk partitioning
striped partitioning
hash partitioning
Processing the Data in Parallel
PLINQ allows you to process parallel items in a
collection using Parallel.For and Parallel.ForEach loops. Here is an example
that illustrates how you can use the ForAll() loop to process items:
IEnumerable<int>
integerList = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var data = from
i in integerList.AsParallel()
where i
<= 5 select i;
data.ForAll(i
=> Console.WriteLine(i));
Here is another example that shows the elasped time
taken by the AsParallel() method to perform a particular task.
int[] myList = new int[90000];
Random
randomListInstance = new Random();
for (int i = 0;
i < myList.Length; i++)
myList[i]
= randomListInstance.Next(90000);
Stopwatch
stopWatch = new Stopwatch();
stopWatch.Start();
var results =
from n in myList.AsParallel() select n;
stopWatch.Stop();
Console.WriteLine("Time Elasped is: "+stopWatch.Elapsed.Milliseconds.ToString()+"
milliseconds");
Console.Read();
You can also handle exceptions thrown by your PLINQ
queries. To do so, you need to use the System.Threading.AggregateException
class. You can retrieve the details of the actual exceptions using the
InnerException property of the System.Threading.AggregateException class.
Suggested Reading
Here are a few links for further references on this topic:
http://blogs.msdn.com/pfxteam/
http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
http://blogs.msdn.com/pfxteam/archive/2009/05/28/9648672.aspx
http://channel9.msdn.com/posts/DanielMoth/Parallel-LINQ-PLINQ/
You can find more information about PLINQ in my
upcoming book, Teach Yourself LINQ in 24
Hours, by Sams Publishing. Happy reading!
Joydip Kanjilal is lead
architect for a company in Hyderabad, India, and is a Microsoft MVP in ASP.NET.
He has authored Entity Framework Tutorial
(Packt Publishing) and many other books and articles. Joydip blogs at aspadvice.com/blogs/joydip.