By now, you've likely heard of the "manycore shift." Hardware manufacturers that previously enabled performance improvements through increases in clock speed have hit against laws of physics that prevent such scaling from continuing. Instead, the silicon that otherwise would have been used for such endeavors has been applied to increasing the number of cores and processors available in commodity hardware. It's now uncommon in the United States to find single-core desktop machines for sale, with the norm being dual-core and quad-core machines, and with that norm increasing to eight-core, sixteen-core, and higher in the not too distant future.
Back in 2005, Herb Sutter wrote about this phenomenon in his now-classic essay, "The Free Lunch Is Over". Herb's thesis is simple: Developers need to start coding their applications to take advantage of parallelism and therefore enable these programs to automatically scale as more and more logical processors are introduced into mainstream hardware. Historically, this has been a difficult task, and the increased concept count, boilerplate code, and complexity necessary to develop parallel implementations has relegated the art to only a select few people specialized in the practice. To truly enable the manycore shift, new support is necessary to make developing for manycore easier.
The Microsoft .NET Framework 4 and Visual Studio 2010 strive to do just that. In the core of the .NET Framework 4 and built in to Visual Studio 2010 are new features that help developers express, efficiently execute, debug, and tune parallelism in their applications. This article provides an overview of that support, starting with work done in the depths of the CLR and moving up the stack into programming models and tooling capabilities.
Minimizing the Overhead of Parallelism
The improvements to support parallel computing within the .NET Framework 4 begin within the bowels of the Framework. To parallelize a single operation, you must partition that operation into multiple pieces, each of which may then be executed asynchronously so that you can execute the pieces concurrently with one another. The more logical processors in a machine, the more partitions are necessary because if there are fewer partitions than there are processors, at least one processor will remain unused in the operation. Of course, executing a piece of work asynchronously has some overhead, even if it's as small as putting a delegate into a data structure that enables background threads to find and execute that work.
Typically, the overhead associated with executing work asynchronously is not related to the amount of real work associated with that work item. As such, the more processors you need to be able to target, the more partitions you'll need to break a piece of work into, the greater the ratio of the overhead for each partition will be to the work associated with that partition, and the more time you'll spend in overhead instead of in processing real work. Thus to efficiently support parallelism, you need to minimize the overhead associated with executing a work item. Toward this end, the ThreadPool in .NET 4 has been modified from earlier versions in several key ways.
The first improvement has to do with that data structure referred to earlier for handing off work to background processing threads. In .NET 3.5, that data structure (referred to as the ThreadPool's "global queue") is a linked list-based queue protected by a Monitor. The Monitor ensures that the queue remains consistent even when multiple threads attempt to mutate it (add and remove work items) concurrently. In .NET 4, that data structure has been replaced with a thread-safe queue implemented in a "lock-free" manner. This data structure has less overhead and scales better than its .NET 3.5 counterpart. Also, the general design and algorithms used for this data structure have been encapsulated into the new, public ConcurrentQueue<T> collection, available in the System.Collections.Concurrent namespace, along with several other new thread-safe and highly scalable collections.