Andre de Cavaignac

Let's blog it out...

WTF: "Problems" with Anonymous Delegates, LINQ, Lambdas within "foreach" or "for" Loops

Sample code available here that exemplifies this bug (requires Visual Studio 2008 Beta 2 and SQL Server). 

So here is an incredibly unintuitive problem I ran into with anonymous delegates (same will happen for lambda expressions) in C# while using them inside loops.  This will also happen in a few other circumstances (for example, in LINQ) , as I will explain in a minute below.

Symptom:
When using an anonymous delegate or lambda expression inside a loop, the results of the delegates execution (outside of the loop) are unexpected.  For example:

  • Your expression is not evaluated as you expect and returns an unexpected value.
  • Your LINQ or DLINQ (LINQ2SQL) execution creates SQL or a query result which is not consistant with your assumptions based on how your loop was constructed.

This problem exists when you create an anonymous delegate inside a loop, using the loop's variable within the delegate, or when you change a variable after creating an anonymous delegate.  The result is sensibly by design but pretty misleading, especially the first time you see it.  Take the example below.  This example was sent to me by a Microsoft employee after I filed a bug with them.  He claims this is by design, and conceptually I agree with him.  Can you guess what would be returned with the statement below?

   1:  const int count = 10;
   2:  Predicate<int>[] predicates = new Predicate<int>[count];
   3:   
   4:  for (int i = 0; i < count; i++)
   5:  {
   6:      predicates[ i ] = delegate(int j) { return i == j; };
   7:  }
   8:   
   9:  Console.WriteLine(predicates[0](0)); // False
  10:  Console.WriteLine(predicates[0](count-1)); // True

When the first predicate (predicates[0]) is called in above, one may expect it to return true because when it was created i == 0.  This is where it gets confusing.  In the closure of the anonymous delegate, the variable i is held by reference, not by value.  Because int i is declared outside of the loop, i within the delegate's closure will increment along with the for loop.  Once we leave the for loop, i is the max value of i throughout the loop (or count-1).  Not what you were expecting?  Me either.

The next example shows the "correct" way of dealing with this, such that line 9 of the code above returns true.

   1:  for (int k = 0; k < count; k++)
   2:  {
   3:      int l = k;
   4:      predicates[k] = delegate(int m) { return l == m; };
   5:  }
   6:   
   7:  Console.WriteLine(predicates[0](0)); // True

In this case, because variable l is created within the for loop, holding reference to l in the delegate is okay, because the value of l will never change, and therefore we get our expected result.  Personally, I was originally expecting it to hold the value of the variable, not the reference to the variable.

This brings us to another interesting example (my original, which you may download the source code to below), where I construct a DLINQ where clause in a loop.  Look at the where clause constructed below.

   1:  // Create a few new items that we can test with.
   2:  TestEntity[] testData = new TestEntity[] {
   3:          new TestEntity(1, "One"),
   4:          new TestEntity(2, "Two"),
   5:          new TestEntity(3, "Three") };
   6:   
   7:  // Add all our test data and submit it to the server
   8:  foreach (TestEntity item in testData)
   9:  {
  10:      context.TestEntities.Add(item);
  11:      Console.WriteLine(item.Name);
  12:  }
  13:   
  14:  context.SubmitChanges();
  15:   
  16:  var result = from item in context.TestEntities
  17:               select item;
  18:   
  19:  // Go through all our local items and append "WHERE" clauses to the
  20:  // result statement.  This is kind of like doing a "NOT IN"
  21:  foreach (TestEntity localItem in testData)
  22:  {
  23:      // See note in FixedTest for diagnosis.
  24:      result = result.Where(sqlItem => sqlItem.Name != localItem.Name);
  25:  }

One would expect that the WHERE clause would exclude all the items in the testData array ("one", "two" and "three" would not be selected).  However, because localItem is held by reference, only the third item ("three") is not selected.  The code below will print "one" and "two" on seperate lines:

   1:  // The WHERE clauses above should have effectively cancelled out all
   2:  // the items in the database and left us selecting nothing.
   3:  foreach (var item in result)
   4:  {
   5:      Console.WriteLine(item.Name);
   6:  }

One would have expected nothing to be printed to the console by simply reading the code.

You can download some sample code here for this bug (Visual Studio 2008 Beta 2).  Note that you will need a SQL Server database at localhost to run this properly.

Thank you to Colin Meek of Microsoft for contributing some of the sample code above, and helping clarify that this functionality is by design in C#.

Update:

To clarify, I'm not saying that this behavior doesn't make sense, but more that its not expected.  I think this should at least provide a compiler warning (I believe VB does this) to tell you that you could possibly be confusing something in your logic.  Because of LINQ's delay-execute, this problem becomes more obvious in the "Where" example.

Comments

Josh Einstein said:

I think it would be far less intuitive if external variables used by the delegate were "frozen" at the time the delegate was instantiated. By being promoted to fields, it makes it easy to deal with repeated or deferred execution against a delegate that depends on external state like that.

However, I think the real problem is that C# and the new language extensions are promoting a mechanism that is too automatic. It's very prone to bugs if you don't understand the mechanism very very well. And you and I are both really skilled with C# and obviously we both missed this.

Personally I think it'd be safer to require that any external state be passed into the anonymous delegate via parameters. For the same reason that pointers can be convenient, but dangerous, I think this blurring of the boundaries of anonymous delegates is just as dangerous.

It's also worth pointing out that the reason your SQL example doesn't behave as described is because LINQ isn't evaluating the expression until you start to enumerate the result. If it was something as simple as an array or list being copied, it should work as expected.

# November 19, 2007 8:22 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)