Common Genius

The online technical home of David Nelson
Welcome to Common Genius Sign in | Join | Help
in Search

Variable Irony

A commentary on technical issues ranging far and wide.

LINQ and Duck Typing

A while back I wrote about Collection Initializers and Duck Typing, and I wasn't particularly upbeat about the feature. I am not a big fan of duck typing; I believe that class designers should have to declare what abilities their classes have, not leave it up to individual developers (who don't have access to the internals of class) to figure out whether a class has a particular ability or not. And no, just having a method of a particular signature does not mean that a class has a particular capability which happens to require an identical method.

What does this have to do with LINQ? A lot, actually. It turns out that LINQ uses duck typing in a very similar manner to collection initializers. Lets look at an example:

IEnumerable<int> ints = new int[] { 1, 2, 3, 4, 5 };
IEnumerable<int> evens = from i in ints where i % 2 == 0 select i;

The compiler turns the query expression into the equivalent of this:

ints.Where<int>(i => i % 2 == 0)

Now, "ints" is an IEnumerable<int>, which doesn't define a Where method. Where is actually an extension method, defined in the new System.Linq.Enumerable class. What is interesting about this, is that the compiler doesn't decide directly which Where method to call (i.e. it doesn't bind directly to System.Linq.Enumerable.Where, just because it is expanding a LINQ query): instead, it expands the LINQ query into a Where method call, and then lets normal method call resolution rules take over (its slightly more complicated than that, but conceptually its an accurate description of what happens). This means that the compiler is essentially duck typing that target of the query (in this case "ints"), counting on the standard query operators being defined in some way, either as members or as extension methods. If they are not defined, a compile error is thrown; however, if any members matching the required signatures are defined, they are assumed to be appropriate standard query operators. This is what allows querying over an instance of IQueryable to have a completely different result than querying over as instance of IEnumerable.

Why is duck typing used, instead of using an interface (IQueryable would seem to be appropriate, except that it was co-opted for another purpose)? One could argue that it allows some data sources to implement some of the standard query operators but not all of them, without having to resort to throwing runtime NotSupportedExceptions. This is valid, but could be fixed by defining atomic interfaces instead of a catch-all interface, which is good practice anyway. The real reason why duck typing is used is essentially the same reason that it is used in collection initializers: it is the only way to make the feature work with IEnumerable and IEnumerable<T>, which was clearly a major design goal of the feature. And it is very useful to have; I have found that I can use LINQ queries to perform operations on in-memory sets of data in a much cleaner and clearer way than writing out equivalent algorithms.

So what do I think about this approach? It suffers from the same flawed assumption as all duck typing: it assumes that, if a member (or extension method) is defined which happens to match the signature of a standard query operator, then that method must in fact be a standard query operator. The compiler is essentially assigning semantic meaning to a given method signature, without an associated contract of any kind (i.e. interface). This is a dangerous assumption, as I described in my collection initializers post, and I do not like seeing the increasing use of duck typing in the C# language.

However, my reaction to duck typing in LINQ is not quite as severe as my reaction to collection initializers, for two reasons. First, the standard query operators have fairly unusual signatures, making it far less likely that a method with a matching signature will be defined for a completely different purpose. I would still rather a contract be used, but overall the chances of a collision are very low. Second, LINQ is an enormously useful features, many times more useful than collection initializers. Therefore, I am willing to put up with a little bit of language corruption for the sake of clearer code and greater productivity.

Is there a better way, a way to get this useful functionality without resorting to duck typing? I don't know. I haven't thought of one yet, and I'm sure the C# team put plenty of thought into it before deciding on the approach that they did. Regardless, it is too late now, since any change making the query rules more strict would be definition be a massive breaking change which the C# team would never allow. So I guess this is just one of those things the language purist in me will have to learn to live with.

Published Friday, March 07, 2008 11:17 AM by dnelson
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Peter Ritchie said:

I think it depends on the type of LINQ provider as to how/when Where is called.  Sometimes it translates into a emitted IL to make the call, sometimes into an expression.

But, could you simply say that any method call is duck typing?  The compiler is simply matching a method based upon it's signature.  How would that be any different than what you've described with the Where extension method, at least in the case where IL to call Where is emitted?

The problem with collection initializers is that it was added at version 3.  It's unreasonable to modify already deployed types simply to add an interface to the derivatives list.  Had collection initializers been there from the start, appropriate collection types might be able to implement an appropriate interface.  But, even then, how do you support use of initializers like the following with just an interface?

       private static Dictionary<int, string> dictionary =

           new Dictionary<int, string>()

       {

           {

               1,

               "One"

           },

       };

March 14, 2008 10:29 AM
 

dnelson said:

@Peter

Thanks for the comment. You are correct that depending on the definition of the Where method, the criteria might be transformed into an anonymous delegate or it might be parsed into an expression. However, that is not the point of this post.

"But, could you simply say that any method call is duck typing?  The compiler is simply matching a method based upon it's signature."

Actually, that is an incomplete description. In most statically typed languages, including C#, normal method calls are matched not only by their signature, but also on the type on which the method is defined. For example:

Form f = new Form();

f.Show();

The method binding for this code is not looking for just any Show method; its looking for the Show method that is defined on the Form class. As the developer of this code, it is my responsibility to know what that Show method does when I call it. Hopefully I have good documentation to tell me what the method does; otherwise I may have to resort to experimentation. But either way, I know exactly what method I am calling, and what I expect it to do. Even in a situation where I am calling an overridden method on a derived class, there is a foundational principle in OO programming that an overridden method will adhere to the preconditions and postconditions of the base method (even if that principle is not always followed).

With duck typing however, that extra condition (the type on which the method is defined) is removed. The method call is bound to any matching method on the instance provided to the call. The caller has no opportunity to verify ahead of time that the behavior of the method that will be called matches the behavior that is expected.

This is similar to what happens with LINQ. The where clause of a LINQ query is expecting certain behavior from the Where method to which it is bound; but because it does not require any given type (i.e. there is not required contract), it cannot verify that the Where method even _attempts_ to implement the expected behavior.

Note that the words "attempts" in the previous sentence is significant. No type can ever know exactly what behavior a method on another type implements. But in a static typing system, the type itself acts as a contract. If class X implement interface IBar, then it is declaring that it implements the behavior required by the contract of that interface. Maybe it really doesn't, but at least there is an established trust relationship between components. With dynamic typing, there is no opportunity to even form that relationship, since there is no type to act as the contract on which the relationship is based.

Since where (and every other LINQ clause) does not require any particular type, and therefore cannot use that type as the basis for the relationship, it must simply call whatever Where method exists and hope for the best. Now, since C# is still a statically typed language, and the expanded LINQ query will be compile-time bound, the developer can still theoretically verify the contract himself. However, since he will never see the expanded query (without reflecting on the compiled code), that verification can be extremely difficult to perform.

Sorry if the metaphor is a little overdone. Hopefully it gets the point across.

March 27, 2008 9:18 PM
 

dnelson said:

@Peter,

Also, regarding your point about collection initializers, I am aware of the problems with trying to add this feature at this point in the lifecycle of the platform. But "we need to hack together a new language feature which violates the core principles of the language because we are unwilling to update the core libraries which the language targets to support new functionality" doesn't strike me as a particularly effective justification.

"It's unreasonable to modify already deployed types simply to add an interface to the derivatives list."

Why? Why is it unreasonable to add interfaces to the implementation list in order to be more explicit about the capabilities of a type? The functionality of the type wouldn't have to change, only the degree to which that functionality is specified. Why is that unreasonable?

March 27, 2008 9:33 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server, by Telligent Systems