Wednesday, November 5, 2008

Advanced Topics in Inefficiency: Anonymous Methods

This is the first in what may be a series of posts on various theoretical (and not so theoretical) corner cases in common code. Despite being obtuse, these issues are useful to explore both to avoid inefficiencies and to better understand what's happening behind the code. Today's topic: anonymous methods.
class Foo()
{
    void Bar()
    {
        var thing1 = new Thing();
        var thing2 = new Thing();

        DoSomeStuff (() => thing1.Shimmy());
        DoOtherStuff(() => thing2.Shake());
    }
}
There is a potential memory problem with this method. It's not obvious from looking at the code, but both things must be garbage collected together. As long as there is an active reference to one, the other will live on as well. If 'Thing' is a heavy type, this could keep significant memory from being reclaimed on the heap. To better understand, let us look at how the C# compiler handles anonymous methods.

The above example demonstrates "local variable capture." This means local variables from the enclosing method body can be used inside closures (such as the two lambdas above). To accomplish this, the C# compiler shunts the values of the local variables to an object. The type of the object is generated by the compiler. In essence, the compiler turns the above code into this:
class Foo()
{
    class GeneratedTypeForMethodBar
    {
        public Thing thing1;
        public Thing thing2;

        public void AnonymousMethod1()
        {
            thing1.Shimmy();
        }

        public void AnonymousMethod2()
        {
            thing2.Shake();
        }
    }

    void Bar()
    {
        var closure_object =
            new GeneratedTypeForMethodBar();
        closure_object.thing1 = new Thing();
        closure_object.thing2 = new Thing();

        DoSomeStuff(closure_object.AnonymousMethod1);
        DoOtherStuff(closure_object.AnonymousMethod2);
    }
}
This is actually a very clever way of achieving local variable capture since it makes use of the CLI's pre-existing garbage collector to clean up the captured variables. The problem is, the compiler shunts all local variables to a single object. In our above example, the two anonymous methods do not reference any of the same local variables, but both local variables are stored in the same object. This can lead some captured variables to become prisoner variables: they are no longer needed, but they cannot be garbage collected. Suppose that our 'DoSomeStuff' method just invokes the delegate and returns. No problem. But now suppose that our 'DoOtherStuff' method holds on to the delegate, perhaps planing to invoke it later. Or suppose we were to return the second lambda, allowing the caller to hold the delegate as long as they please. That delegate holds a reference to the 'closure_object' which holds a reference to both Things, even though that delegate just needs 'thing2'. There is no way for any code to reach 'thing1' but it won't be garbage collected until we're done with 'thing2'.

Solution?

Well, we could modify the compiler to generate a type for each set of local variables that appear in only one anonymous method, like so:
class Foo()
{
    class GeneratedTypeForMethodBar1
    {
        public Thing thing1;

        public void AnonymousMethod()
        {
            thing1.Shimmy();
        }
    }

    class GeneratedTypeForMethodBar2
    {
        public Thing thing2;

        public void AnonymousMethod()
        {
            thing2.Shake();
        }
    }

    void Bar()
    {
        var closure_object1 =
            new GeneratedTypeForMethodBar1();
        closure_object1.thing1 = new Thing();

        var closure_object2 =
            new GeneratedTypeForMethodBar2();
        closure_object2.thing2 = new Thing();

        DoSomeStuff(closure_object1.AnonymousMethod);
        DoOtherStuff(closure_object2.AnonymousMethod);
    }
}
This poses problems as well. First of all, we are instantiating two (or more) generated-type objects rather than one. Object instantiation is not cheap and that could potentially slow down certain code. Also, this approach cannot be used to optimize more complex scenarios. Suppose we have five anonymous delegates, each referencing some of seven local variables like so: a {1 2} b {2 3} c {3 4 5} d {1 5 6} e {6 7}. In these situations we must default to the one-compiler-generated-type-for-everything approach.

Ultimately, the lesson here is just to be aware of these potential issues. If you find via profiling that objects are not being garbage collected and you make heavy use of anonymous methods, you might want to examine your closures to make sure this isn't causing the problem.

And what do people think about modifying the compiler as proposed above? Also, anyone who comes up with a better mechanism for local variable capture gets cool points. Double points if your solution doesn't require VM changes.

P.S. Thanks to Michael for help with this post.

16 comments:

Anonymous said...

I'm completely ignorant, so please excuse my question :). In the case of the five anonymous delegates, why does this not work:

var closure_object_a = new GeneratedTypeForMethodBarA();
closure_object_a.thing1 = new Thing();
closure_object_a.thing2 = new Thing();

var closure_object_b = new GeneratedTypeForMethodBarB();
closure_object_b.thing2 = closure_object_a.thing2;
closure_object_b.thing3 = new Thing();

// ...

DoSomeStuff(closure_object_a.AnonymousMethod);
DoOtherStuff(closure_object_b.AnonymousMethod);
// ...

Scott said...

The problem with that is, if you assign another value to the variable, you have to update all of the closure objects. That gets messy fast.

Anonymous said...

There's another interesting example with closures too - imagine what happens if you combine them with the Using statement:

Action getDeferredAction(object o)
{
Action of IDisposable ret; // cant use "tags" in Blogger
using(disp = o.ReturnsADisposable()) {
ret = (() => disp.SomeFunc());
}

return ret;
}

var a = getDeferredAction(some_var);
var b = a();

This function is interesting in that a closure and 'Using' do opposite things to the scope of a variable - in this case, you'd end up calling SomeFunc() on an object that has already been disposed!

Barry Kelly said...

I'm going to make some definitions clear so there's no misunderstanding:

stack frame - the usual meaning, an activation record allocated on the CPU stack containing the parameters, return address, local variables etc.

frame instance - an object that contains things you normally find in a stack frame, but modelled as an object to make discussion of the anonymous method technique outlined in this post amenable to more familiar OO descriptions.

frame type - similarly, the type of the frame object

frame variable - a location of the frame type that contains a reference to the frame variable

Now, location capture must extend storage lifetime as it is visible to the programmer. On the other hand, many uses of location capture do not in fact rely on extending storage lifetime. So, in cases where location capture can be statically verified not to be required (by analysis of the receiver of the delegate), creation of a frame instance can be omitted.

This can be modelled as assuming the frame type to be a value type of an appropriate size and type to map to the minimum spanning range of captured locations, and the anonymous method body to be a method on this value type, but it requires that delegate construction treat the stack-allocated type as if it were heap-allocated for the purpose of passing the 'this' reference to the method; this is unverifiable without the JIT also knowing about the receiver's composition. Of course, if the anonymous method doesn't modify captured locations, copying may be cheaper and avoids the need for JIT modification.

Unverifiably, the above approach can be implemented solely in the compiler.

Another strategy that can be applied in the same situation is familiar from functional programming: deforestation. In this case, you inline the receiver of the delegate, and then inline the body of the anonymous method inside the inlined receiver. Depending on the relative sizes and usages of things, this may be better or worse than the previous approach. It is also implementable entirely in the compiler.

If actual lifetime extension cannot be avoided, dynamic memory allocation is absolutely necessary and the problem becomes one of GC efficiency more than anonymous methods. It comes down to the degree of smartness in the JIT & GC wrt liveness analysis.

An obvious liveness implementation approach that comes to mind is that of reference counting captured locations.

Assuming that the compiler can verify that there is no cycle in the graph of anonymous_method -> frame_instance -> captured_location (e.g. as there would be by storing one of the anonymous method delegates in a captured location), then reference counting support in the GC can kick in when it determines that a delegate is eligible for collection; it decrements the count associated with each captured location specific to that delegate's anonymous method, and disconnects those locations' subgraphs when the refcounts reach zero.

The above approach would involve a fair amount of JIT & GC cooperation, but it can be generalized somewhat beyond the above specifics, even to eliminating reference counting. Consider a normal tracing GC that treats delegates specially: objects that are kept alive solely by delegate links have their object data traceability edges processed selectively, based on the field usage of the union of the transitive closure of the calling graph for all the methods contained in live delegates pointing to that object.

I imagine that would entail a fair amount of overhead, whether the sets were stored in metadata or calculated heuristically on-need by scanning CIL etc., but if the pattern - heavy object kept alive only by multiple delegates that themselves have different lifetimes - then it may be a win. You'd have to measure actual occurrence rates of the painful pattern to figure out the size of the potential win.

Barry Kelly said...

PS (checking for replies to my comment): I don't think you meant that the features your are talking about are obtuse (i.e. not sharp), but rather abstruse :)

Scott said...

Here is a solution that requires no VM changes. It's not pretty: http://docs.google.com/Doc?id=ah4grsjk7k4c_1050hj2qwjgm

Anonymous said...

Why not using structs instead of classes? It would be cheaper.

Anonymous said...

Why not have smart compiler which would track the method dependencies:
public class Foo()
{
    public void Bar()
    {
        var thing1 = new Thing();
        var thing2 = new Thing();

        DoSomeStuff (() => thing1.Shimmy());
        DoOtherStuff(() => thing2.Shake());
    }
}
=>
public class Foo() {
  private class _Bar__0() {
    public Thing thing1;
    public void call0() {
      thing1.Shimmy();
    }
  }
  private class _Bar__1() {
    public Thing thing2;
    public void call1() {
      thing2.Shake()
    }
  }
  public void Bar() {
    var _bar_0 = new _Bar_0();
    var _bar_1 = new _Bar_1();
    
    _bar_0.thing1 = new Thing();
    _bar_1.thing2 = new Thing();
    
    DoSomeStuff(_bar_0.call0);
    DoOtherStuff(_bar_1.call1);
  }
}


What goes where IMHO could be determined from SSA.

Anonymous said...

I'd call this behavior a bug, for three reasons:

1) you could be keeping a lot of unreachable objects in memory this way (possibly also memory leak)

2) it probably INCREASES garbage collection time, because the garbage collector has to traverse (and potentially copy) the unused data

3) object instantiation is cheap if you have a good garbage collector

The right solution is to create a different class for each closure with a reference to each local variable that it needs (reference to support assignment).

This means that you have to allocate one object per closure and one per local variable (for the refence). The references can be eliminated if you don't assign to the variable.

BTW how do you implement this:

c = (x) => (() => ++x);
a = c.call(2);
b = c.call(3);
a.call(); // 3
a.call(); // 4
a.call(); // 5
b.call(); // 4
a.call(); // 6

I don't know how to write this in C#, but I hope it's clear what it's intended to do.
c is a closure that when called with an argument x creates a closure that increments x and returns x.
The comments indicate the return values.
Note that a and b should be independent: incrementing a's x shouldn't change b's x and vice versa.

And how do you implement this (assume that Pair is a class with two fields, first and second):

c = (x) => new Pair(
() => ++x,
(y) => (() => x -= y)
)

c is a closure that takes an intial value x. It returns a pair of closures. The first one can be used to increment x, the second can be used to create a closure that when called with an argument y, creates a closure that you can use to subtract y from x (yeah...).

You could use it like this:

a = c.call(3); // x=3
a.first.call(); // x=4
b = a.second.call(2);
b.call(); // x=2
b.call(); // x=0
a.first.call(); // x=1

And again, you could have v = c.call(2) and w = c.call(3) and they should be indepentent (changes to v's x should not affect w's x).

Barry Kelly said...

Scott - the approach you outline in the document, adding the extra indirection, is pretty much the same approach as the refcounted one I outlined, but using tracing GC techniques instead (i.e. an actual object graph edge to represent each live reference).

I have to admit, I have refcounting on the brain though, since I relatively recently implemented anonymous methods for Delphi, a non-GC language, so refcounting had to be used.

Maciej - your approach is in fact an approach outlined by Scott in the main post, but it breaks down when you have some, but not all, locations shared between different anonymous methods.

Jules - Scott's linked approach - the indirection - and my approach - refcounting - are both more efficient wrt allocation than your suggestion, as Scott's approach only needs an extra allocation per each (anonymous method, capture) pair, while you also require an extra allocation per closure; also, my approach of interacting with GC for refcounting avoids the need for extra allocations, at the cost of more intelligence in the GC.

With respect to your question, i.e.:

  Func<int,Func<int>> c = x => () => ++x;

... it is handled trivially: anonymous methods nested inside other anonymous methods are just like anonymous methods inside any other method, and handled recursively as such. The capture here is the variable, and a new capture is created every time you call c. The only twist with nested anonymous methods is that parent linking is required if the nested anonymous method captures a location in one of its outer scopes.

Application of that approach to your other example is then obvious, and the non-interference you expect is indeed respected, etc.

Anonymous said...

I didn't notice Scott's new proposal. It seems to be exactly the same as what I'm proposing.

Because of parent linking, all data in a parent closure will not be garbage collected until the inner closure becomes unreachable?

Can you explain if and why your method is faster than Scotts method + elimination of Local instances if the local variables aren't assigned to?

Barry Kelly said...

Jules - I don't make any assertions about my method being faster, as it depends on the GC implementation having extra smarts - though smarts that would technically be extendable to other areas. All GC approaches can generally be boiled down to some combination of reference counting and edge traversal, for more information see this paper; a direct link is here.

WRT your question about parents, yes, the same issue comes up there, and it can be handled in a similar way.

Sandro Magi said...

Lifting all locals into a single anonymous object, regardless of variable capture, is a bug, period. The compiler is making assumptions about my code by tying the lifetime of two locals together, despite how they are actually captured and used. This is very bad, and simply wrong.

I don't care how much more expensive it is to instantiate two objects instead of one. I'm already implicitly stating that's what I want by creating two lambdas and capturing separate locals. The compiler simply ignoring me is a bug.

I recommend the C# compiler devs read papers on "safe for space" closure conversion:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.300

http://www.cs.purdue.edu/homes/suresh/502-Fall2008/papers/closureRep.pdf

Scott said...

"Lifting all locals into a single anonymous object, regardless of variable capture, is a bug, period."

Well, variables which are not used in anonymous methods are not hoisted into the object.

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!

Unknown said...

Weak pointers at local variables could solve this, right?