Daniel Vaughan has a very cool project up on CodeProject.com. He's built an awesome grid computing "platform" using Silverlight for his compute nodes. Think SETI@Home style.
CodeProject article: http://www.codeproject.com/KB/silverlight/GridComputing.aspx
Online Demo: http://www.orpius.com/Silverlight/Legion/
Make sure you checkout the administration console in the demo!
After not writing anything about HPC or Windows Compute Cluster for a while, I figured its about time I write *something* about it because I've been working with it so much recently!
Download the Sample Project (Visual Studio 2008, Beta 2)
Windows Compute Cluster Server 2003 is Microsoft's relatively new implementation of cluster computing running on Windows. Although Microsoft is starting behind in this game (Unix has been in grid computing for a long time now), the .NET development environment and ease of administration makes CCS a very compelling environment. The first version of CCS is not very feature rich, but provides the core components required to build distributed applications. Version 2 (named Windows HPC 2008) which is currently in beta offers a much wider range of functionality and should add some new ideas and competitve edge to the Windows grid computing world.
In my current project, I have gotten to experience quite a bit of what CCS has to offer, and have been building on top of the CCS system. One of the common tasks that we've run accross is the ability to run "map-reduce" style financial models against the cluster. These types of models require a wide range specified by an input parameter to be split into segments of work that can be executed in parallel against the cluster. The segments generally create a significant amount of output data, which then must be summarized by a reduce process to produce a "meaningful" output. In some cases, the summarized output will then be fed into another map, and the process continues.

The ability to run a simple process like the one shown above is an important capability of CCS, however CCS is designed to be a tool, not a means to the goal, and does not have built in support for a "map reduce". In the worlds of CCS, each step outline in the graph (the map, segments and reduce) would all be "tasks" grouped into one logical "job".
To handle this map/reduce scenario, a function of CCS jobs, task dependencies, can be used to ensure that the tasks run in the proper order (and that the reduce does not occur before all the segments have completed). In the sample code included with this project, three console apps are included. One does the split, one runs a single map, and one does the reduce.
Split:
The split will add tasks to the job that it is currently a member of so that they can run as maps. The maps will each get their own input file so that they know which segment they are responsible for. Each map task also "depends" on the split task, meaning that they will wait for the split to complete before they start.
The split will also schedule a reduce task, and ensure that that task depends on all the maps, so it does not start until all maps are complete.
Map:
Each map/segment will run given its input prameters (written by the map) and output another file containing their results.
Reduce:
The reduce step will take all the map outputs and run an average on the values to come up with a "summarized" value, in this case, the average of the primes that were requested.
How To Run This Project:
When you run this model, it is expected that you use a system such as the CCS job console, the command line interface or your own custom-built interface to submit a job with the following characteristics:
-
The only task specified to run within the job is the map task, where the first command line parameter is the input XML file you want to use (this input file needs to be defined to be deserialized into a PrimeFinderInput object) (This task is responsible for creating the rest of the required tasks)
-
The "minimum processor count" is set to the number of nodes processors you wish to parallelize accross.
-
Your input parameters are well formed, and the Processors property of the input is set to the number of segments you want to run. Ideally this is equal to (or greater than) the "minimum processor count".
There you have it. The basics to a simple map/reduce job in CCS!
Download the Sample Project (Visual Studio 2008, Beta 2)