On Software
Time and time again noise builds around the (re)introduction of thin client computing, the death of the desktop OS and the emergence of a single, large networked computer that we all work on. A super mainframe that is omnipresent across all devices and networks.
But it has not been since the days of the drab green screens of the main frames that we have seen a significant uptake in thin clients over heavy metal. This is now ripe for change.
The stars are aligning in the cloud computing movement, and this time, it won't be another 3com web computing device. The traditional precursors to the explosion of the web device market have been torn down - bandwidth is ubiquitous, price of technology has come down so dramatically that these devices are now comparatively cheap against their full-blown competitors, and most computing services that people have come to expect have an online incarnation.
But the most important change is cultural. People are demanding cheap alternatives to desktops. The tell tale signs are there - the netbook craze, the rich web phone interfaces (iPhone, Android), and even the Kindle are all thin clients that are heavily in demand. People have begun to see the value add for a consumer of not having to maintain a desktop OS - they expect it to work like their TV - managed centrally, and pushed down. Zero maintainence time. With a wide range of people now spending most of their computing time in web browsers, and the need for desktop apps marginalized, the consumer demand has shifted towards less expensive, internet-oriented devices.
Even with the netbook fire sale, the market for these devices still hasn't been properly defined. Current netbooks are essentially small laptops with desktop OSes. While these machines hit the price point consumers expect in thin-clients, some of the real value adds a thin client can provide to consumers have not been realized in netbooks. The OS running on these devices is unnecessarially large, boot times are slow, and features like cloud storage are still not out of the box. Couple this with the poor form factors and tiny keyboards, and there is still a lot to be done in this market. Consumers know they don't need full blown desktop machines, but so far have only been exposed to "small laptops".
Despite the demand for netbooks, consumers still don't know what these web devices are - there is still an opportunity for proper market definition. There are very few companies who can really make the idea of web devices sit with consumers; excellent marketing coupled with good software and hardware is critical to open the market up. With so many people having lost their shirts in the web device game over the past decade, chances are slim this company will emerge as a startup. Apple is clearly an industry favorite for defining new markets like this. With the marketing engine to hit the concept home with consumers and the technical understanding to make it happen, and recent rumors of a "big iPhone" or "Apple Tablet", it's beginning to look like the market definition will start there. But there are other players who look like they will be emerging as players. TechCrunch has the "Crunch Tablet", Kindle is looking more and more like a web device daily, and Nokia has expressed serious interest in expanding into net books.
While heavy metal desktops will never become fully eclipsed by web devices, the desktop OS market looks to be in terminal decline.
Update: ... Or maybe Google will be the one to define the market.
Today I had to help someone with a problem with iTunes on Windows. For those of you that have not had the pleasure of using iTunes on Windows, it sucks, like most software written by Apple for Windows. That being said, this problem really had nothing to do with iTunes. The problem was that iTunes was 'losing' music. As it turns out, the music that the user saw as 'lost' turned out to have been moved from its original location (the desktop) to the 'Music' folder. iTunes uses absolute paths in its media library to hold reference to the file. When the file moves, the reference is broken, and thus, the music appears 'lost'. Without iTunes open all the time (and due to the occasionally unreliable file system watcher), it is simply not possible to track these files.
But then I started wondering... Surely Apple wouldn't let this happen on OSX. Details like this tend to be a place where Apple shines. So what in the world does iTunes do if I move something from the 'Music' folder to my 'Desktop' in OSX. I decided to try. First I closed iTunes, then I copied a music file, went back to iTunes, and hit play. Low and behold, it played, no problem. Not only did this leave me confused, but it also left me interested, so I decided to do a little exploring...
From what I can tell, Apple foresaw this problem with their 'aliases' early on in their original Mac OS, so rather than referencing files by path, you can reference files by a unique ID (if they are on a partition that uses a supported file system).
I'm no expert on Cocoa, but from what I can tell, the details of this can be found in the File Manager APIs and FSRef.
And for some more interesting reading on the topic.
So next time you get the dreaded 'cannot find the target to this shortcut, do you want to try to find the file?' prompt, just think about those 1980's Apple employees and how they solved this problem...
Jump to a section of this article:
Why do consumers love the browser?
One of the things I still have not fully understood is the consumer's affection for the browser. Unlike desktop applications, consumers seem to feel more comfortable and safe in a browser.
Perhaps it is the familiarity, always knowing where you are with a URL, the 'safety' of not messing up your computer, the security of having data stored in a cloud where it doesn't need backup, or the easy way to get away from where you are with the back button. Add to this the familiarity of having an interactive 'page' over a bunch of forms and dialog boxes and popup windows, and the element of design and simplicity on the web, web developers were quicker to embrace a "less is more" approach, and web pages are typically thought through and designed by professional designers more often than desktop applications.
When you take all of these things into account, it makes a bit of sense why consumers have shown a preference for web based applications, and it is no surprise that most of the WPF applications we've seen emerge have followed the same basic browser concepts (back button, page like UI, navigation at top, hyperlinks, etc).
For someone who tries to stay away from web development, my recent past has been riddled with all the latest buzz-words on the web - Silverlight, Flash, AIR and JavaScript are all tools that I've had to engage with. I've built trading systems in Flash and my most recent application is RIA in HTML + JavaScript... None of these platforms have had the allure of WPF for me, the developer, but its what clients have asked for, and clearly what consumers are interested in.
Who is winning the war in RIA (Flex, Silverlight, HTML+JavaScript)?
Why is it that we feel the need to place browser chrome around everything, and when we do, what platform has emerged as dominant for the new breed of complex applications (Google Maps, docs, etc) that we've come to expect in the browser?
With out a doubt, HTML and JavaScript has been a hands-down winner. Why is it so hard to name a single site that is a pure Flash or Silverlight application. HTML + JS is certainly not easier to program, and Flash is now ubiquitous, so the concern of requiring a plugin is not there, is there something about interacting with the native browser, not a plugin, that has some appeal.
I have become a firm believer that as a non-software company, it is your interest to leverage Microsoft-based development technologies as much as possible. The ability to quickly get things done in .NET and its surrounding technologies is unparalleled. When you are a software company though, it is much harder to justify. Tying yourself to Microsoft, one who likely is or will become a competitor in your space is risky. Hiring good web people (if you're a web startup) is harder because the .NET development culture tends to do things "the Microsoft way". And your acquisition story becomes limited (Google, eBay, Facebook, etc is less likely to by a .NET based system because it doesn't fit in their infrastructure). For this reason, it may be clear why Silverlight is and will not become the dominant application platform on the web any time soon. Perhaps many web companies have the same fear of Adobe Flash. By contrast, HTML and JavaScript is "open" and is not dominated by any particular entity.
The decision of most web companies to stick with HTML, despite the rich features of other platforms, is simply a business decision. HTML will always be there, is relatively reliable, relatively backwards compatible (Adobe is pretty good at breaking builds of older code with newer versions of Flex), and does not tie you to the whims of any one company or platform.
JavaScript as Byte Code - C#, Java, etc to JavaScript Compilers and Cross-Compilers
HTML is great for document layout. Its fast, clean and everyone knows it. Sprinkle a bit of JavaScript on
that, and you have a nice interactive document. But HTML was never
designed to be the basis for an application platform, and really is a pretty poor application platform. But despite the features of other platforms, and the number of times HTML and JavaScript is pronounced dead, something breaths life back into it.
It started with FireFox, and continued on with the awefully impressive Google Maps. Google has a large interest in keeping the browser alive and relevant. The longer it does so, the longer it has a platform on which to compete with Microsoft that is already deployed to 99% of the world's user base. A "Google Flash" may not be feasible, because of the deployment nightmare involved, but as HTML evolves with offline capabilities, richer CSS and faster JavaScript engines (see Google Chrome), it becomes a platform that can largely compete with the rest.
And for this reason, Google has cleverly turned the browser into their own RIA platform. The "Google Virtual Machine", as you may call it. To do this, they've leveraged the existing tools that Java has to offer, and created Google Web Toolkit - a Java compiler that can compile Java into cross-browser JavaScript. And on the client side, they've added Google Chrome. You can easily consider Google Web Toolkit and Google Chrome to be a full RIA platform, with JavaScript as the byte code transport. And not only that, but its an RIA platform fully backwards compatible with the existing deployed base of browsers.
Perhaps the most interesting thing is that Google Web Toolkit is not the only JS or cross-compiler. A variety for compilers have been built for this purpose. Microsoft Research has Script#, which compiles C# to JavaScript, and a bunch of projects use JSC, a cross compiler that compiles MSIL into JavaScript (and therefore any .NET language). With a whole variety of tools out there by different vendors to compile to JavaScript, it appears that JavaScript has become the new universal byte code. It can be processed by a whole variety of browsers, and produced by a variety of tools - one of the only computing platforms that that is true for.
And compiled JavaScript can also be pretty fast and make applications that are rich and look good. While it doesn't compare to the offerings of Cocoa, WPF or Silverlight, the quality of application that can be achieved is good enough for most consumers. In the case of GWT, it is not only fast, but also efficient - and best of all, code can be shared amongst client and server.
I don't like HTML as an application platform, and I am not very fond of running every application inside the browser chrome. But as business strategy and deployability have become increasingly important to me in my recent venture, an RIA application based around the native browser has become my platform of choice. Cross compilation, in my experience, has proven an effective way to make a very rich client that can even handle push-data.
Early in the web development world, scripting languages such as ASP or PHP were used to compose pages. Although this proved great for relatively static pages, the dynamic web, filled with rich applications called for a more powerful framework. Thus, frameworks like ASP.NET were born.
ASP.NET solved a good number of problem spaces, but has made creating simple pages (such as a resume or menu, or other primitive list of data) more cumbersome. With the world of COM development becoming less common and less preferable, the gap for a scripting language to replace VBScript/ASP is needed. PowerShell scripting has filled the gap left by the demise of VBScript, but nothing has come along to replace ASP.
PowerShell Pages is an ASP like language, based on the PowerShell runtime. Using a simple HTTP Handler, ASP.NET can render pages scripted using PowerShell script (including cmdlets, and CLR/.NET objects) to the web. Simple, fast and intuitive programming for simple pages that just need to display some content.
The PowerShell Pages project is an open source project that I am starting. Its implementation will be based on ASP.NET using a simple handler capable of consuming PowerShell HTML (PSH) scripts and writing HTML. Because the script is hosted in ASP.NET, the ASP.NET HttpContext and the other components of the object model are available. PSH scripts can work side-by-side with ASPX pages.
Ready to see what PowerShell Pages look like?
Sample Page: http://decav.com/psp/resume.psh
Sample Page Code: http://decav.com/psp/viewsource.psh?page=resume.psh
View Source Code: http://decav.com/psp/viewsource.psh?page=viewsource.psh
Join the Project - Visit the PowerShell Pages CodePlex project workspace
After hosting Gatsb.com for a year, I have decided to open source the project. I haven't had time to promote it the way I wanted to promote it, and it therefore hasn't caught on with too many users. I've been busy with other projects, other (grander) ideas, and think that this one should now belong to the community.
You can download the source code here: http://code.google.com/p/gatsb/
If you want to contribute to the project, please feel free to contact me at andreblog@decav.com
This project is a great case study of a bunch of new Microsoft technologies. It includes Workflow Foundation, .NET 3.0, Microsoft Virtual Earth and ASP.NET Ajax.
One of the coolest features is how it keeps open a "session" with an SMS client. Workflow Foundation is used to persist the state and when a new SMS comes in, it checks if theres an existing session for that SMS. If there is, then it will revive the workflow and continue the session. Cool stuff!
If you ever wondered how to create a database of geotagged entries, use ASP.NET ajax to make a snazzier site, or establish a user interface with any mobile device, I suggest checking out the code!
Heres a useful snippit of code I wrote as a utility and thought I'd post up for everyone. Very often when using a command line, you need to repeat a set of commands. For example, copy several files:
copy "c:\blah.txt" "c:\otherPlace\blah.txt"
copy "c:\dev\blah.txt" "c:\otherPlace\blah2.txt"
History is okay for this, but if you have a set of commands you need to run, you need to run them each individually. Also, if you go to do something else, they'll eventually lose their spot in the command history.
To solve this, I created a "scratch pad". Nothing special, but definately useful.
new-scratch "MyFirstScratch"
copy "c:\blah.txt" "c:\otherPlace\blah.txt"
scratch
copy "c:\dev\blah.txt" "c:\otherplace\blah2.txt"
scratch
## Do some other stuff here...
get-scratch # prints out the contents of the scratch.
invoke-scratch
This scratch pad is holds a set of commands that you want to rerun later. You can then call invoke-scratch to run the scratchpad. You can have multiple scratch pads, and use them by just adding the name to the end of the function (get-scratch "MyFirstScratch"). By default all scratch pad commands will default to the last used scratch pad.
The example below shows how to shorten the sample above:
new-scratch "MyFirstScratch"
copy "c:\blah.txt" "c:\otherPlace\blah.txt"
copy "c:\dev\blah.txt" "c:\otherplace\blah2.txt"
scratch 2 # Saves the last 2 items to the scratch pad
This can be very useful for writing scripts too. the Save-Scratch command lets you export your scratch to a file.
Heres the code. Add it to your profile and go. Enjoy!
function New-Scratch($name) {
if ($global:scratchPads -eq $null) {
$global:scratchPads = @{}
}
$pad = new-object System.Collections.ArrayList
$global:scratchPads.Add($name, $pad)
$global:currentScratchPad = $pad
$global:currentScratchPadName = $name
}
function Get-Scratch($name=$null) {
if ($name -ne $null) {
$global:currentScratchPad = $global:scratchPads[$name]
$global:currentScratchPadName = $name
}
return $global:currentScratchPad
}
function Scratch([int]$count=1,$name=$null) {
if ($name -ne $null) {
$global:currentScratchPad = $global:scratchPads[$name]
$global:currentScratchPadName = $name
}
$hist = get-history
if ($name -eq $null) {
$name = $global:currentScratchPadName
}
for ($i=$hist.Length-$count; $i -lt $hist.Length; $i++) {
$global:currentScratchPad.Add($hist[$i].CommandLine) | out-null
}
}
function Invoke-Scratch($name=$null) {
if ($name -ne $null) {
$global:currentScratchPad = $global:scratchPads[$name]
$global:currentScratchPadName = $name
}
foreach ($item in $global:currentScratchPad) {
write-host ">> $item"
invoke-expression $item
}
}
function Save-Scratch($path, $name=$null) {
if ($name -ne $null) {
$global:currentScratchPad = $global:scratchPads[$name]
$global:currentScratchPadName = $name
}
$global:currentScratchPad | out-file $path
}
If you want to run your PowerShell scripts from a piece of code you're writing, and would prefer not to use Process.Start, you can easily use the PowerShell runtime classes to run those scripts for you, completely in-process.
The code snippit below shows how you can create a Runspace and pass it some simple commands.
List<Command> commands = new List<Command>();
commands.Add(new Command("set-location c:\\"));
commands.Add(new Command("./MyScript.ps1 'myParameter'", true));
using (Runspace runspace = RunspaceFactory.CreateRunspace())
{
runspace.Open();
using (Pipeline pipeline = runspace.CreatePipeline())
{
foreach (Command cmd in commands)
{
pipeline.Commands.Add(cmd);
}
pipeline.Invoke();
}
runspace.Close();
}
One thing to note here is you do not have a PSHost, so if you have a write-host anywhere, it will fail with the following exception:
CmdletInvocationException: Cannot invoke this function becasue the current host does not implement it.
To get around this, you can remove write-host from your scripts and just have them write single lines, or you can implement a simple PSHost (although this is a bit harder than it sounds).
If you've been using System.DirectoryServices.DirectoryEntry class, or the newer System.DirectoryServices.AccountManagement namespace to access your LDAP or Active Directory server, you may have experienced the following error:
COMException: "Unknown error (0x80005000)"
This can happen for numerous reasons, but one of the most frustrating and overlooked reason's I've found for this problem is when your LDAP connection string is malformed. One of the most common malformations is in the case sensitivity of the LDAP:// component. For example, LDAP://myServer/cn=users,dc=myserver,dc=com is a valid connection string, however ldap://myServer/cn=users,dc=myserver,dc=com is not.
If you use the Uri or UriBuilder classes, the builder may lowecase your scheme. Always make sure to recapitalize the scheme when passing it into DirectoryEntry or any other API.
The Message Passing Interface (MPI) standard, and its .NET implementation, MPI.NET have been some of the cornerstones of development on compute clusters. The standard supplies a simple yet primitive way of both sending and receiving data between running compute processes.
The large advantage of MPI has been a mix of its simplicity and speed. A call to MPI Send on one node and MPI Receive on another block both callers until the operation is complete. Some more complex calls, such as MPI Scatter and MPI Gather allow a single node to distribute data to a set of nodes or retrieve it from a set of nodes. An MPI Barrier allows all nodes to stop until they have all reached the agreed upon place in code, then allowing them to continue. Such primitives allow a distributed set of processes to communicate, do some work, and then share values that each needs to continue with eachother. Because this is all done with some low level, bare metal socket tricks and/or shared memory, the result is blazingly fast communication.
With this simplicity however, comes a trade off. MPI has been a standard for nearly 20 years and has changed very little since its inception. The way we program today has changed drastically, especially with managed languages such as C#. No longer do we tend to worry about memory allocation, or dealing with raw memory. Today, most languages have a concept of automatic memory allocations, garbage collection and type safety. Although the primitives in MPI are unparalleled in simplicity for allowing multiple processors to communicate about a shared set of work, some striking limitations are found once we dig a bit deeper.
When I started with MPI.NET, I found the interface very simple. An Mpi.Send<T>(obj) would send an object to a waiting client. Mpi.Receive<T>() would give you back that object. Nothing could be simpler. In my example, however T happened to be a class that contained a byte[] of an undetermined size. Once run on the cluster, the size of the byte[] I was passing increased dramatically, and an unexpected exception occured:
AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory has been corrupted.
After lengthy investigation, I found that MPI.NET was attempting to pin some memory in the .NET GC heap and pass that memory location as a buffer to the underlying MSMPI stack. In doing so, it did not allocate enough memory for my large byte[], causing the write to try to write into the GC heap, thus throwing the exception. In my case, to solve this, I created a large enough buffer and passed it into an override of Mpi.Receive<byte>(byte[]). This overload pins the entire array that was passed in on the GC, and then passes that to the MSMPI stack. On the send side, I manually serialized my class, checked the length (to ensure I would not overflow the receive buffer) and sent the byte[] instead of the raw class. This solution does not take into accound messages larger than my expected buffer. For that, I would have needed to chunk down the data.
The moral of the story here is sending primitives, arrays of primitives or fixed-sized structs over MPI.NET (which is the most common scenario) is a great use of a very fast messaging interface. Once your demands get more complex, the MPI stack gets less favorable, not because of its inability to send more complex messages, but because of the manual labor involved in serializing and chunking down data.
It is no wonder that the HPC community is moving away from the traditional methods of MPI and communication across a set of processors to a Service Oriented (SOA) model. The benefits of using existing components, such as WCF and its NetTcpBinding, the threading models, serialization and transport models, and other features already provided by these frameworks outweights the possible performance penalty. Problems such as the one explained above simply do not happen with frameworks like WCF. Furthermore, although the underlying concepts of MPI and its simple messaging model are very simple and appealing, the overall development, maintainence and debugging of a SOA application is much simpler than that of a MPI application. The amount of code complexity and custom code drops when compared to an MPI implemenation.
The general industry trend seems to be towards SOA models. Microsoft Windows HPC Server 2008 is a great example of this. HPC Server uses WCF to distribute load across the cluster, and can even dynamically scale resources depending on demand of a particular service. Platform, another industry competior has been building with a SOA model for some time now.
I'm looking forward to playing with HPC Server 2008 and WCF more as time progresses. I think that the WCF model will solve a whole bunch of headaches that one incurs when trying to communicate over perhaps over simple primitives such as MPI. Many models and workloads simply do not require the type of communication MPI provides, and using MPI can be like fitting a square peg into a round hole. This is not to say MPI does not have its place, many complex processes do require constant communication between a set of workers, however I believe many of the problems we use HPC for today can distributed using SOA in a much simpler fashion.
I recently got to playing with Ruby, something that some colleagues in Lab49 have been big fans for some time. I've never been a big fan of scripting languages, but have grown more of an appreciation for functional programming over the past several months and thought I would give it a try.
Ruby is a very smart language, and I can certainly see why it has some appeal. The "don't repeat yourself" and "defaults over configuration" aspects of the language and its framework are really nice for cranking out simple applications.
ORM and Inference of Properties from database
I am a big fan of the objects that automatically get their properties from the database (a Customer object will automatically be linked with the Customers table when pulled through the ORM). Things like this make it dirt simple to crank out web projects of moderate size without writing tons of redundant code (as is the case with a classic OO approach, using adapters, abstractions, etc).
Even though this is a nice feature, and great for many projects, I am concerned that this is simply not sufficient for larger enterprise apps. Very often your app layer and database layer should be different, as one does not always properly map to the other. I suppose that you could achieve this nicely with views, however for large applications, the lack of abstraction seems a little brittle to me. At the same time, if you did change something in your database, you'd still have to change your abstraction logic, so maybe this isn't all that bad.
Syntax
The syntax of Ruby is very clean and straight forward. I like the bare bones structure, nothing more than needed is written. Reading into it a bit more, I found it interesting that various things can be written in several ways, for example:
while i < 3
print(i)
end
can also be written
print(i) while i < 3
While conceptually this is nice (the latter "reads" better), it is something I have a really hard time accepting. Yes it is nice not to have to write things like "end", however, syntactical differences like this (and others, such as the optional parenthesis around method parameters) concern me. While its great that you can develop your own style, and do what feels comfortable to you, it may be very confusing to other developers. An argument can be made either way, after call you can do this:
if (i < 3) Console.WriteLine(i);
in C#. Nothing stops you. At the end of the day, sloppy code is sloppy code, and its really a matter of having a well trained developer, not a strict language.
Naming Conventions
Here's something that threw me off. Ruby does a very good job of making names intuitive, however some of their names seem to break their own concepts. First of all, names like to_s are just sloppy. If you're making a lanugage thats supposed to be readable, why write cryptic names like this? Also, whats the deal with properties count and length being synonyms? Why have both? It seems relatively dangerous to me to have methods that mean the same thing on objects. A developer who sees count in one place and length in another may think they are different, something that certainly doesn't help code legibility.
Overall, Ruby has been fun to learn (although I'm still a novice), and I can certainly see its value. I'm not sure if I agree with some of its "friendly" tendencies -- being able to write whatever you feel and have it probably compile is not necessarily good. Just because code compiles doesn't mean its good code -- some bugs the compiler may catch now become runtime bugs. On the other hand, things like the ORM make it very easy to build rich apps with little code, something that .NET still comes up short on (just because designers generate LINQ classes doesn't mean that the code isn't there). Looking forward to playing more!
Here's a cool demo that Ronald Lintag and I threw together for Lab49. The application is supposed to demo a basic GUI that shows the status of nodes and jobs on a compute cluster. We're showing off the use of WPF 3D and styles to create a cool looking console for a sys-admin. You can imagine how something developed off this base could turn into a real product.
This demo is a bit rough around the edges, and definately needs polish, but its still worth a look.
-
Press J to bring up the job list.
-
Use the dropdown at the bottom of the job list to add some jobs.
-
Click on job to view details about their tasks.
-
The nodes in the background vary from gray to green to red depending on their utilization
Click here to launch the application (.NET 3.5 required)
Here are a few screenshots for those of you who are not adveturous enough for the ClickOnce...


Daniel Vaughan has a very cool project up on CodeProject.com. He's built an awesome grid computing "platform" using Silverlight for his compute nodes. Think SETI@Home style.
CodeProject article: http://www.codeproject.com/KB/silverlight/GridComputing.aspx
Online Demo: http://www.orpius.com/Silverlight/Legion/
Make sure you checkout the administration console in the demo!
After not writing anything about HPC or Windows Compute Cluster for a while, I figured its about time I write *something* about it because I've been working with it so much recently!
Download the Sample Project (Visual Studio 2008, Beta 2)
Windows Compute Cluster Server 2003 is Microsoft's relatively new implementation of cluster computing running on Windows. Although Microsoft is starting behind in this game (Unix has been in grid computing for a long time now), the .NET development environment and ease of administration makes CCS a very compelling environment. The first version of CCS is not very feature rich, but provides the core components required to build distributed applications. Version 2 (named Windows HPC 2008) which is currently in beta offers a much wider range of functionality and should add some new ideas and competitve edge to the Windows grid computing world.
In my current project, I have gotten to experience quite a bit of what CCS has to offer, and have been building on top of the CCS system. One of the common tasks that we've run accross is the ability to run "map-reduce" style financial models against the cluster. These types of models require a wide range specified by an input parameter to be split into segments of work that can be executed in parallel against the cluster. The segments generally create a significant amount of output data, which then must be summarized by a reduce process to produce a "meaningful" output. In some cases, the summarized output will then be fed into another map, and the process continues.

The ability to run a simple process like the one shown above is an important capability of CCS, however CCS is designed to be a tool, not a means to the goal, and does not have built in support for a "map reduce". In the worlds of CCS, each step outline in the graph (the map, segments and reduce) would all be "tasks" grouped into one logical "job".
To handle this map/reduce scenario, a function of CCS jobs, task dependencies, can be used to ensure that the tasks run in the proper order (and that the reduce does not occur before all the segments have completed). In the sample code included with this project, three console apps are included. One does the split, one runs a single map, and one does the reduce.
Split:
The split will add tasks to the job that it is currently a member of so that they can run as maps. The maps will each get their own input file so that they know which segment they are responsible for. Each map task also "depends" on the split task, meaning that they will wait for the split to complete before they start.
The split will also schedule a reduce task, and ensure that that task depends on all the maps, so it does not start until all maps are complete.
Map:
Each map/segment will run given its input prameters (written by the map) and output another file containing their results.
Reduce:
The reduce step will take all the map outputs and run an average on the values to come up with a "summarized" value, in this case, the average of the primes that were requested.
How To Run This Project:
When you run this model, it is expected that you use a system such as the CCS job console, the command line interface or your own custom-built interface to submit a job with the following characteristics:
-
The only task specified to run within the job is the map task, where the first command line parameter is the input XML file you want to use (this input file needs to be defined to be deserialized into a PrimeFinderInput object) (This task is responsible for creating the rest of the required tasks)
-
The "minimum processor count" is set to the number of nodes processors you wish to parallelize accross.
-
Your input parameters are well formed, and the Processors property of the input is set to the number of segments you want to run. Ideally this is equal to (or greater than) the "minimum processor count".
There you have it. The basics to a simple map/reduce job in CCS!
Download the Sample Project (Visual Studio 2008, Beta 2)
Sample code available here that exemplifies this bug (requires Visual Studio 2008 Beta 2 and SQL Server).
So here is an incredibly unintuitive problem I ran into with anonymous delegates (same will happen for lambda expressions) in C# while using them inside loops. This will also happen in a few other circumstances (for example, in LINQ) , as I will explain in a minute below.
Symptom:
When using an anonymous delegate or lambda expression inside a loop, the results of the delegates execution (outside of the loop) are unexpected. For example:
-
Your expression is not evaluated as you expect and returns an unexpected value.
-
Your LINQ or DLINQ (LINQ2SQL) execution creates SQL or a query result which is not consistant with your assumptions based on how your loop was constructed.
This problem exists when you create an anonymous delegate inside a loop, using the loop's variable within the delegate, or when you change a variable after creating an anonymous delegate. The result is sensibly by design but pretty misleading, especially the first time you see it. Take the example below. This example was sent to me by a Microsoft employee after I filed a bug with them. He claims this is by design, and conceptually I agree with him. Can you guess what would be returned with the statement below?
1: const int count = 10;
2: Predicate<int>[] predicates = new Predicate<int>[count];
3:
4: for (int i = 0; i < count; i++)
5: { 6: predicates[ i ] = delegate(int j) { return i == j; }; 7: }
8:
9: Console.WriteLine(predicates[0](0)); // False
10: Console.WriteLine(predicates[0](count-1)); // True
When the first predicate (predicates[0]) is called in above, one may expect it to return true because when it was created i == 0. This is where it gets confusing. In the closure of the anonymous delegate, the variable i is held by reference, not by value. Because int i is declared outside of the loop, i within the delegate's closure will increment along with the for loop. Once we leave the for loop, i is the max value of i throughout the loop (or count-1). Not what you were expecting? Me either.
The next example shows the "correct" way of dealing with this, such that line 9 of the code above returns true.
1: for (int k = 0; k < count; k++)
2: { 3: int l = k;
4: predicates[k] = delegate(int m) { return l == m; }; 5: }
6:
7: Console.WriteLine(predicates[0](0)); // True
In this case, because variable l is created within the for loop, holding reference to l in the delegate is okay, because the value of l will never change, and therefore we get our expected result. Personally, I was originally expecting it to hold the value of the variable, not the reference to the variable.
This brings us to another interesting example (my original, which you may download the source code to below), where I construct a DLINQ where clause in a loop. Look at the where clause constructed below.
1: // Create a few new items that we can test with.
2: TestEntity[] testData = new TestEntity[] { 3: new TestEntity(1, "One"),
4: new TestEntity(2, "Two"),
5: new TestEntity(3, "Three") };
6:
7: // Add all our test data and submit it to the server
8: foreach (TestEntity item in testData)
9: { 10: context.TestEntities.Add(item);
11: Console.WriteLine(item.Name);
12: }
13:
14: context.SubmitChanges();
15:
16: var result = from item in context.TestEntities
17: select item;
18:
19: // Go through all our local items and append "WHERE" clauses to the
20: // result statement. This is kind of like doing a "NOT IN"
21: foreach (TestEntity localItem in testData)
22: { 23: // See note in FixedTest for diagnosis.
24: result = result.Where(sqlItem => sqlItem.Name != localItem.Name);
25: }
One would expect that the WHERE clause would exclude all the items in the testData array ("one", "two" and "three" would not be selected). However, because localItem is held by reference, only the third item ("three") is not selected. The code below will print "one" and "two" on seperate lines:
1: // The WHERE clauses above should have effectively cancelled out all
2: // the items in the database and left us selecting nothing.
3: foreach (var item in result)
4: { 5: Console.WriteLine(item.Name);
6: }
One would have expected nothing to be printed to the console by simply reading the code.
You can download some sample code here for this bug (Visual Studio 2008 Beta 2). Note that you will need a SQL Server database at localhost to run this properly.
Thank you to Colin Meek of Microsoft for contributing some of the sample code above, and helping clarify that this functionality is by design in C#.
Update:
To clarify, I'm not saying that this behavior doesn't make sense, but more that its not expected. I think this should at least provide a compiler warning (I believe VB does this) to tell you that you could possibly be confusing something in your logic. Because of LINQ's delay-execute, this problem becomes more obvious in the "Where" example.
Trying to decide how to write some unit test helper classes, I ran into an interesting question. What happens when you use the new keyword along with .NET generics? The example below shows my dilemma:
1: [TestMethod]
2: public void DoBase()
3: { 4: DoT<Base>(new Base());
5: }
6:
7: [TestMethod]
8: public void DoDerived()
9: { 10: DoT<Derived>(new Derived());
11: }
12:
13: public void DoT<T>(T item)
14: where T : Base
15: { 16: item.Blah();
17: }
18:
19: public class Base
20: { 21: public Base()
22: { 23: }
24:
25: public void Blah()
26: { 27: Console.WriteLine("Hello"); 28: }
29: }
30:
31: public class Derived : Base
32: { 33: public new void Blah()
34: { 35: Console.WriteLine("Bye"); 36: }
37: }
The question is what happens when you call DoDerived()? Because the class is typed as T (which in this case is the Derived class), you would expect "Bye" to be written to the console. How would that work though, since the compiler has no concept of Derived.Blah, and therefore cannot generate the IL to access it at compile time of the generic?
Sure enough, "Hello" will be written from Base.Blah, as if we were cast to Base at the time of invocation.
More Posts
Next page »