Common Genius

The online technical home of David Nelson
Welcome to Common Genius Sign in | Join | Help
in Search

Variable Irony

A commentary on technical issues ranging far and wide.

  • Magic Booleans

    Hopefully somewhere in your programming career you reached a point where you could recognize the following as a Bad Idea:

    Foo.Bar(42)

    What does 42 represent? Why are we passing it into the Bar method? Why not 41 or 43 instead? Presumably 42 has some particular meaning in the context of the method call, but it is not obvious from the usage what that meaning is. Numeric constants which are used directly in expressions like this are called Magic Numbers, and most of us are taught to recognize them and eliminate them by creating named constants:

    constant int AnswerToLifeTheUniverseAndEverything = 42;
    ...
    Foo.Bar(AnswerToLifeTheUniverseAndEverything);

    Now it is clear why we are using 42: because it is The Answer to Life, the Universe, and Everything. This makes the code much more readable. It also means that should we ever need to change the value of the constant, we only have to change it in one place, instead of hunting for the value 42 everywhere in our code and changing it to 43.

    The odd thing is, though, that the same mental alarm that goes off in my head at the sight of numeric constants in expressions is curiously silent for boolean constants. That is:

    Control.Invalidate(true);

    For some reason, I don't immediately see anything wrong with this; but I should, because this "magic boolean" suffers from the same problems as the magic number example. What does true represent here? How does it affect the behavior of the method call, compared to passing false instead? Or, even worse:

    Control c = ...;

    c.SelectNextControl(c, true, true, false, true);

    Yes, that is a real method. Now, I have some pretty significant experience in WinForms programming, but I can state for a fact that there is no way I can remember what each of those boolean parameters does. If I am reading through code and come across this call, and I need to know what it does, I HAVE to stop and look it up in the documentation. Yes, Intellisense might tell me, but that means that I have to stop reading, position the cursor, invoke Intellisense, and in some cases find the right overload, which isn't always trivial. That's a lot of hassle.

    Do you have a mental alarm for magic booleans? If not, why not? Why do we as programmers tend to distinguish them from magic numbers? In the next post we will look at options for eliminating magic booleans.

  • Dynamic Typing and a Tale of Two Interviews

    A manager needed to hire a new employee, so he posted an ad on a job search website. He soon had a flood of responses. The manager carefully examined the resume of each applicant, looking for specific skills that he knew the job required. Not surprisingly, most of the respondents didn't fit the bill. The manager selected a few that seemed like they might have the right qualifications, and brought them in for an interview. He gave each of them a set of tests to complete that would exercise the skills he knew he would need. Several of the candidates couldn't finish the test. After reviewing the results, the manager selected one of those who completed the tests for the position.

    Another manager needed to hire a new employee. He did a keyword search through the resumes on a job search website, using terms that were related to the position for which he was hiring. After wading through the results to make sure that the search engine hadn't returned erroneous results, he selected a number of candidates to come in for an interview. He described what he needed, and asked each one if they thought they could do the job. They all said that they were sure they could. He then gave each of them a set of tests to complete, which would exercise the skills he knew he would need. Many of the applicants couldn't finish the test. After reviewing the results, the manager selected one of those who completed the tests for the position, confident that anyone who could pass his interview test must possess the necessary skills to perform the daily tasks for which he was hiring.

    Which manager was promoted, and which one was fired? 

  • Why I Hate Marketing

    When I complained on Scott Hanselman's blog (on something of a tangent I admit) that the versioning of the .NET Framework was nonsensical, he asked me how I would have done it. I think he was subtly trying to point out to me that its not as easy as it looks. But I already knew that. Obviously trying to come up with a system for describing a platform of technologies and tools in a way that makes sense to techies and non-techies alike is a challenge. But the way it has been done so far doesn't seem to make sense to anyone, not even the people who came up with it.

    I think part of the challenge is the way .NET can be versioned side by side. This makes it somewhat unusual in the world of technology platforms. Usually, new versions of software are intended to replace older versions. Upgrading from one version to the next means you no longer have the older version around, but you use the newer version instead. This is even true for the Java runtime. The typical end-user doesn't have mulitple versions of the Java runtime installed at one time; he upgrades from one to the next. .NET is different in that major versions are intended to live side-by-side, so traditional versioning schemes don't necessarily make sense. However, it seem to me that, given the circumstances, the appropriate thing to do is come up with a versioning scheme that does make sense, and apply it uniformly. And for the life of me, I simply cannot come up with a scheme which accounts for what we have seen so far with the .NET Framework. Here are a few examples:

    • Relatively minor changes and additions to the .NET Framework 1.0 were called the .NET Framework 1.1. Similarly scoped changes and additions to the .NET Framework 2.0 were called the .NET Framework 2.0 SP1. Seriously folks, what is the point of using decimal version numbers if you are going to attach a separate name like SP1 instead of incrementing the decimal?
    • .NET 1.1, a minor version increment, included a new CLR, new libraries with new features, and changes to existing libraries. .NET 3.0, a major version increment, included ONLY a new set of libraries with new features. .NET 3.5, a minor version increment (but a big one), includes a new set of libraries, and changes to existing libraries.

    So how should it look? I don't really know, but I have a few ideas:

    1. Most importantly, the CLR and BCL need to be versioned separately, primarily because of the technical support implications of a new CLR that do not exist with the BCL. Supposedly the advantage to keeping them under the same umbrella is that non-technical decision makers don't have to understand the difference. The problem is, in reality they DO have to understand it. Upgrading from .NET 1.1 to 2.0 required it departments to deal with a whole new set of trust relationships for the new runtime. When I started talking to my employer about using .NET 3.0, I was immediately shutdown because of the perceived effort of doing the same configuration all over again. I was forced to explain to them that whereas 2.0 included a new runtime, 3.0 did not, so it did not require the same effort to "upgrade". The perceived benefit of maintaining a single umbrella was lost.
    2. I would prefer that orthogonal technology releases like the WinFX set not be released as new versions of the .NET Framework, but instead as "extensions" or "components" which are dependent on the .NET Framework. It would make the versioning story so much simpler. However, since I know that's never going to happen, the WinFX release should have been .NET 2.x. The minor version increment indicates that this is an optional upgrade; if the particular features available in the new version don't apply to your situation, you can stay with what you have. Contrast this with .NET 2.0, where generics enabled scenarios that simply weren't feasible before, and they applied to everyone. My company certainly had no use for any of the technologies in .NET 3.0 at the time it was released, but because it was a major version leap, there was the perception that we were "falling behind" by not upgrading. Sadly that was probably one of the motivations of the MS marketing team that made the decision.
    3. .NET 3.5 should have been .NET 2.X+1. Again, there simply is nothing here that is applicable on the order of generics that justifies pushing everyone in the world to upgrade. I'm not downing LINQ; I love LINQ. And I find LINQ to SQL to be somewhat useful in certain situations. Expression trees are fun to play with, and have some extraordinary uses in advanced scenarios. Extension methods are like playing with fire, but that can fun too. But in terms of justifying time, expense, and risk of upgrade, there is just not enough to say that it should be applied across the board. (I am quite certain all of the LINQ-ophiles are going to charge me with heresy for this, but I stand by it.)
    So that's what I think. Is it better than what we have now? I certainly think so, but I bet not everyone does. What do you think?
  • WPF Attached Dependency Properties and Roles

    I have been spending more time looking into WPF recently. One thing that I am discovering is that, underneath all of the new and fancy terms like "Dependency Properties", many of the concepts in WPF are just a rehash of similar concepts from Windows Forms, with adjustments made for lessons learned from previous technologies. Dependency Properties are a somewhat complex concept, but one particular aspect of them, Attached Dependency Properties, are almost a direct replacement for the Extender Provider technology from WinForms. Extender providers (components which implement IExtenderProvider) allowed designers to "add" properties to other components on a design surface (such as a form). Of course, you can't actually add properties to an instance; instead, the extender provider would store the property values associated with each instance. Attached dependency properties are similar; they allow objects to "attach" properties onto other objects.

    There is at least one difference between the two technologies. Properties added by an extender provider applied to the whole design surface to which the component was added (although the specific components to which properties were added could be filtered); attached dependency properties allow more fine-grained control by only adding properties to elements which exist underneath the adding element in the UI hierarchy.

    For example, in WinForms, the Control class from which all WinForms controls are derived has the properties Left and Top. These properties specify where on the parent control these controls are placed. However, the control itself will almost never use these properties; it is the parent control which uses them to place the control within its container. So these properties don't really belong on the control to which they apply; but that was the only consistent way to implement that requirement in WinForms.

    In WPF, these properties exist as attached dependency properties on the Canvas class. So any UI element which is added to a Canvas "gains" these additional properties, which can be set on the element in XAML ("Canvas.Left = '30'"). But it is the Canvas itself which actually holds these values. All UI elements do not have to implement these properties themselves in order to be used on a Canvas (and in fact, in many cases these properties wouldn't make sense, such as if the element were added to a StackPanel instead).

    Another way to think of attached dependency properties is by thinking about roles. You could describe a UI element placed on a Canvas as taking on the role of an "absolutely positioned UI element". This role is not part of the element's class hierarchy; being absolutely positioned is not an inherent part of its existence. But as soon as it is placed on the Canvas, it takes on that role, and there is additional data (properties) which relate to its existence in that role that must be "attached" to it. Attached dependency properties allow the Canvas to fit the UI element into that role, without the element itself ever needing to know about it (i.e. without those properties having to exist in the type hierarchy of the element itself).

    Since I have been interested lately in the concept of objects taking on roles, rather than simply existing in a static type hierarchy, I am curious as to how far this concept of attached properties and roles could be taken; something I will be exploring in the near future. Does anyone have any immediate thoughts?

  • LINQ and Duck Typing

    A while back I wrote about Collection Initializers and Duck Typing, and I wasn't particularly upbeat about the feature. I am not a big fan of duck typing; I believe that class designers should have to declare what abilities their classes have, not leave it up to individual developers (who don't have access to the internals of class) to figure out whether a class has a particular ability or not. And no, just having a method of a particular signature does not mean that a class has a particular capability which happens to require an identical method.

    What does this have to do with LINQ? A lot, actually. It turns out that LINQ uses duck typing in a very similar manner to collection initializers. Lets look at an example:

    IEnumerable<int> ints = new int[] { 1, 2, 3, 4, 5 };
    IEnumerable<int> evens = from i in ints where i % 2 == 0 select i;

    The compiler turns the query expression into the equivalent of this:

    ints.Where<int>(i => i % 2 == 0)

    Now, "ints" is an IEnumerable<int>, which doesn't define a Where method. Where is actually an extension method, defined in the new System.Linq.Enumerable class. What is interesting about this, is that the compiler doesn't decide directly which Where method to call (i.e. it doesn't bind directly to System.Linq.Enumerable.Where, just because it is expanding a LINQ query): instead, it expands the LINQ query into a Where method call, and then lets normal method call resolution rules take over (its slightly more complicated than that, but conceptually its an accurate description of what happens). This means that the compiler is essentially duck typing that target of the query (in this case "ints"), counting on the standard query operators being defined in some way, either as members or as extension methods. If they are not defined, a compile error is thrown; however, if any members matching the required signatures are defined, they are assumed to be appropriate standard query operators. This is what allows querying over an instance of IQueryable to have a completely different result than querying over as instance of IEnumerable.

    Why is duck typing used, instead of using an interface (IQueryable would seem to be appropriate, except that it was co-opted for another purpose)? One could argue that it allows some data sources to implement some of the standard query operators but not all of them, without having to resort to throwing runtime NotSupportedExceptions. This is valid, but could be fixed by defining atomic interfaces instead of a catch-all interface, which is good practice anyway. The real reason why duck typing is used is essentially the same reason that it is used in collection initializers: it is the only way to make the feature work with IEnumerable and IEnumerable<T>, which was clearly a major design goal of the feature. And it is very useful to have; I have found that I can use LINQ queries to perform operations on in-memory sets of data in a much cleaner and clearer way than writing out equivalent algorithms.

    So what do I think about this approach? It suffers from the same flawed assumption as all duck typing: it assumes that, if a member (or extension method) is defined which happens to match the signature of a standard query operator, then that method must in fact be a standard query operator. The compiler is essentially assigning semantic meaning to a given method signature, without an associated contract of any kind (i.e. interface). This is a dangerous assumption, as I described in my collection initializers post, and I do not like seeing the increasing use of duck typing in the C# language.

    However, my reaction to duck typing in LINQ is not quite as severe as my reaction to collection initializers, for two reasons. First, the standard query operators have fairly unusual signatures, making it far less likely that a method with a matching signature will be defined for a completely different purpose. I would still rather a contract be used, but overall the chances of a collision are very low. Second, LINQ is an enormously useful features, many times more useful than collection initializers. Therefore, I am willing to put up with a little bit of language corruption for the sake of clearer code and greater productivity.

    Is there a better way, a way to get this useful functionality without resorting to duck typing? I don't know. I haven't thought of one yet, and I'm sure the C# team put plenty of thought into it before deciding on the approach that they did. Regardless, it is too late now, since any change making the query rules more strict would be definition be a massive breaking change which the C# team would never allow. So I guess this is just one of those things the language purist in me will have to learn to live with.

  • Programming and Chess

    So much for the series on C# vs VB.NET. Maybe I will get back to it someday.

    In the meantime, I read a blog post today, most of which didn't make any sense to me. It starts out talking about chess, and its relationship both to war and to political maneuvering. It then goes on to discuss, in a very random fashion, whether programming, and in particular hacking, can draw the same parallels.

    However, amid the randomness, there was a certain part which really resonated with me:

    "[quoting Edsger Dijkstra, see post for reference]'The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague.'

    "So it’s all hogwash. The very thing which draws you to programming is your undoing. Unless you are it’s undoing. I hate you, Edsger Dijkstra!"

    The same is true with chess. When I learned to play chess, what drew me to it was not its historical significance or its artistic beauty, it was how exciting playing chess could be! I would conjure up traps and combinations and tactical assaults and hurl them at my opponent, all the while trying to dodge the positional darts he was throwing back at me. I would charge my pieces across the board, attacking, defending, capturing; it got my adrenaline pumping faster than anything else could at a young age.

    But the more I studied and the more I played, the more I discovered that, if I wanted to be a good chess player, I couldn't just throw my pieces across the board. I couldn't charge into the enemy lines at will, break open his position and declare victory. I had to be careful not to over-extend my position, not to let me pieces get cut off and trapped, not to leave my weak side vulnerable, not to allow counter-attack. There were so many things to consider. And suddenly, it just wasn't as exciting as it used to be. Some would say that taking on these additional considerations is what makes chess so intellectually challenging. And they are right. I still love playing chess, I do find it challenging and mentally stimulating, and I do appreciate its artistic beauty. But it just doesn't get my adrenaline pumping the way it used to.

    The same can be said for programming. What first drew me to programming was that, as the programmer, there was nothing I couldn't do. I was the master of the digital domain over which I ruled. Give me any problem and I would conquer it with for loops and conditional statements and recursive functions. And in the end, when my program displayed the output I desired, I would declare victory.

    Then I went to school to "learn to program", and then I got a job as a "programmer". And now, there are so many more things to worry about. Its not just a matter of getting the right answer any more; its how you get to that answer that matters. I can't use conditional statements because they might be confusing to other developers. I can't use recursive functions because they suffer performance problems. What happens if the network goes down? What happens if the database goes down? What if a malicious user gets on the system? What if the power goes out? How do you make your user interface powerful and intuitive at the same time? How do you make your system efficient yet robust and highly available? How do you integrate different platforms? All these things add to the challenge of programming, which can be rewarding in itself. And I do love being a programmer. At the end of the day I still love it when I see the answer I expected to see.

    But I miss the adrenaline rush.
     

  • C# vs VB.NET

    If you are bilingual, or if you spend any time at all in the .NET blogosphere, you have undoubtedly witnessed many of the posts/discussions/arguments/wars over which language is better, C# or VB.NET. These skirmishes range from mild to violent, and are often filled with statements like "All VB.NET users are amateurs" or "All C# users are arrogant elitists." Much of the debate is filled with FUD, and at times it seems to border on a religious war.

    Let me state right here that I have used both languages in a production environment, and I do not believe either language is objectively better than the other. Many of the arguments you see on the web come down to whether "If ... Then...End If" is better than "if(...){...}". Frankly, that is a ridiculous and pointless argument. There are pros and cons to using keywords or symbols, but in the end it comes down to personal preference. However, that does not mean that the languages are identical. There are in fact real differences between the languages beyond syntax that should be considered when choosing which one to use.

     I am going to start a series of blog posts about the real differences between the languages. To start I will deal with the 2.0 versions of each language, and at the end I will explore some of the new features coming out in the 3.0 versions. Below is a partial list of the features that will be covered. If there are others that deserve to be covered, please leave a comment and let me know.

    C# 2.0

    Anonymous methods
    Yield Return

     C# 3.0

    Collection initializers

    VB.NET 2.0 (VB 8)

    With...End With
    Optional Parameters
    Declarative Event Handling
    Conditional Exception Handling

    VB.NET 3.0 (VB 9)

    ???

    Compilation differences
    Visual Studio IDE differences

  • C# Language Design Decisions Explained

    While researching language differences between C# and VB.NET for an upcoming blog series, I ran across a section of the Visual C# Developer site on MSDN called Ask a Language Designer. There are several short articles from various members of the C# language team explaining why certain things about the C# language are the way they are. Things such as case-sensitivity and the lack of checked exceptions are discussed at a high level. I would love to see more articles like this; there are undoubtedly many more choices for which the C# community would benefit from having an explanation. If you are a C# programmer, or are interested in programming language design in general, I recommend taking a look.

  • Hungarian Notation - The Good, the Bad, and the Ugly

    While blog-surfing recently I ended up at an article about the ups and downs of Hungarian notation: "Hungarian Notation - The Good, The Bad, and The Ugly." The Good makes some good points, although I personally I don't believe that they are enough to overcome its shortcomings. The Bad is poorly written, in my opinion, and does not add anything substantial to the argument. The Ugly, however, is very well written and makes several good points. It expresses my primary argument against Hungarian notation quite clearly:

    "Hungarian notation encodes type information into variable names. This is very useful in languages that don't keep track of types information for you. But in C++ or Eiffel it is completely redundant. Thus, the notation simply adds to obscurity."

    It also expresses a point which, although I was aware of it implicitly, I had not been able to express previously:

    "Hungarian notation is, when all is said and done, a commenting technique. And the one great law of comments is that they lie."

    This is precisely true. Hungarian notation, at least in the form that most of us learned in school, simply says "this variable is this type." Its like a comment. But comments quickly become outdated as programs change, and often in the end they are more misleading than helpful.

    I do still use an underscore before class-level variables, but this is mostly to distinguish them from the properties which they typically represent (In C# I could rely on casing differences to make the distinction, but going down that road is just asking for death-by-typo).

  • DataSet/DataTable serialization bug with modified row and newline string value

    Recently I ran across an unusual bug in the .NET framework. Our application uses .NET remoting with DataSets, using RemotingFormat = SerializationFormat.Xml (the default in .NET 2.0, and the only option in .NET 1.1). We were getting a DBConcurrencyException when trying to save data that was being passed from the client to the server, even though we knew for certain that the data in the database had not changed. After some investigation we discovered that the problem was related to a field in a modified row that had a value of "\r\n", i.e. the ascii values 10 and 13. Although the row had been modified, that field had not, so the original and current values were the same. However, examining the same data after it had been passed to the server, the original value of that field in the modified row was not "\r\n", but an empty string. Capturing the xml for the serialized DataSet (by serializing it to a file) showed the following xml node being used for that field value:

    <ColumnName xml:space="preserve">
    </ColumnName>

    The exact same node was used for the original and current serialized values. However, when the DataSet was deserialized, the current value was correct, but the original value was not.

    It turns out that the scenario is easily reproducable using the information obtained above. I submitted a bug report to Microsoft at https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=247212. The code for reproducing the issue is there, if you are interested.

  • "Project location is not trusted", on a local drive

    I have had a problem before where I get the following message trying to open a solution or project in Visual Studio:

    The project location is not trusted: <path to project>
    Running the application may result in security exceptions when it attempts to perform actions which require full trust.

    Googling the error message turns up lots of hits on how to give UNC paths full trust when opening projects from a network drive. But all of my projects are local. Eventually I discovered that the problem was that the project had been extracted from a downloaded zip file, which Windows XP had "quarantined" in a partial trust zone. The recommend solution is to open the file properties on the zip file and click the "Unblock" button, before unzipping it. Unfortunately, I ran into a situation today where I had already extracted and deleted the zip file. In searching for a way to unblock all of the extracted files, I didn't have much luck, until I found a forum post pointing me to ZoneStripper. ZoneStripper "unblocks" any file or set of files in a directory, which was perfect for my situation. Also included in the readme is a technical explanation of how Windows XP "blocks" downloaded files, and how ZoneStripper fixes the problem, which I found interesting.

  • catch(Exception)

    A few months ago there was a series of posts on the FxCop blog about the FxCop rule "Do not catch general exception types." These posts generated some lively debate about how strictly this rule should be enforced. I found it quite interesting to read the arguments from both sides on whether or not "catch(Exception)" should EVER be used. I will post my own opinions on this issue in the future. However, one comment in particular stuck out to me:

    Of course you're correct that deployed applications that crash frequently shouldn't be deployed. However, there are many circumstances that lead to this happening anyway.

    It's not all dev teams that have a user base and budget as large as MS's that allow it to do something like have a public beta, that is very close to final functionality, in front of thousands of users, for months, as MS have done with Visual Studio. Even after all the CTPs, betas, RCs, etc, there are still plenty of bugs found pretty quickly in VS 2005.

    In our situation, the cost of taking the user's time for testing (in terms of the value they're not adding by doing their job) is far greater than the cost of development time. This leads to a business that has a low level of interest in providing much help in terms of serious testing. The cost of the software being late one day is much greater than the cost of a day of development.

    Of course, the end result is less quality. So as much as we'd like to have lots of testing, so bugs were few and far between, so we could fail fast when, once in a blue moon, an exception occurred, it's just not going to be the reality. No amount of telling the business "This is not how you write good software" is going to change things.

    So while I agree that the Fail Fast method is the "correct" approach, because it precludes invalid state in the application, I'm not sure how realistic an approach it can be for quite a lot of common application dev scenarios where the business controls dev budget/time frame.

    So I guess if the rule is still relevant to some cases, then it should be kept, and the rest of us can just suppress it, which always gives me a nice warm fuzzy feeling inside :P

    See the post on which this comment was made for more context.

    What I found interesting about this comment was that the poster is essentially arguing that the rule does not always apply because business demands sometimes dictate that applications be deployed before they have reached the highest level of quality. I also have some opinions on that which I will post in the future. The point I want to make here is that best practices are always best practices, even when we choose not to follow them. It could be that to get an application out the door with the time and the budget that the powers-that-be have dicatated, you have to cut some corners. But that doesn't mean that cutting corners is the best way to implement the application; it just means that it was necessary in a given set of circumstances,

    The documentation for the FxCop rule says "Do not exclude a warning from this rule." I suppose the commenter would have preferred it to say "unless you have to in order to finish your project on time." But that's not the point. FxCop, and other static analysis tools, can only tell us what we should do; it up to us whether or not to do it.

  • Collection Initializers and Duck Typing

    Recently I came across a blog post by Mads Torgersen, the project manager for the C# language team at Microsoft. In it he talks about collection initializers, one of the new language features of the upcoming version 3.0 of C#. Essentially, it allows you to initialize a new collection instance using set-based syntax instead of functional syntax. Check out the post for a more complete description and examples of the feature.

    What most interested me from the post was the method by which the compiler will determine how to add new elements to the collection. It does not use the ICollection<T>.Add method, which would be the most obvious method for a collection initializer. Mads' post touches on one of the reaons why not, and I have added two more:

    1. Classes built prior to .NET 2.0 do not implement ICollection<T>, so they could not use collection initializers, even if they followed the standard collection pattern and implemented a strongly-typed Add method. A large number of the classes in the .NET Base Class Library (BCL) fall into this category (in fact, there are only 14 public instantiable classes which implement ICollection<T>).
    2. Many collection classes can have items arbitarily added to them, but not removed. Examples are Queue, Stack, and XmlNameTable. These classes do not implement ICollection<T>, presumably because the interface requires a Remove(T) method (although the method could have been implemented explicitly, and throw a NotSupportedException; I am not a big fan of this pattern, but it has been used elsewhere in the BCL). Since they don't implement the interface, they would be unable to use collection initializers, even though they behave like collections for the purpose of adding items.
    3. Classes that implement ICollection<T> might prefer to use a method other than Add(T). Dictionaries are the prime example; Dictionary<TKey, TValue> implements ICollection<KeyValuePair<TKey,TValue>> (including an explicitly implemented Add(KeyValuePair<TKey, TValue>) method), but uses Add(TKey, TValue) as its primary Add method, which is more usable.

    The solution outlined for this problem is an application of "duck typing". Basically, the idea is that a class doesn't have to implement ICollection<T> (or even ICollection for that matter) to be treated like a collection for the purposes of this feature; it just has to look like a collection (hence the term "duck typing": "if it walks like a duck, and talks like a duck..."). This results in a new implementation for collection initializers: instead of implementing ICollection<T>, a class must implement IEnumerable<T> and have a public Add method to use the feature. The compiler essentially expands the collection initializer into a series of Add method calls, and then uses the same method resolution rules used for method calls (its actually a little more complicated than that; it will also look for explicit implementations of ICollection.Add, and a few other special cases). Using this approach eliminates problems 1 and 3 in the list above; any public Add method can be used in collection initializer syntax. It seems that this is an improvement over the use of ICollection<T>.

    However, there is an underlying assumption to this implementation which is extremely troublesome, and results in significant drawbacks that, in my opinion, outweigh the positives that the feature brings to the language. That assumption is that any class which implements IEnumerable and has a public Add method is a collection.

    I will grant that the vast majority of the time, this will be true. But what about when it isn't? There are some cases where such an assumption could produce some highly non-intuitive results. For example, consider this class, a custom wrapper around Delegate called MyMulticastDelegate. It implements IEnumerable<<T> (where T is Delegate), and has a public Add method. However, instances of MyMulticastDelegate, like instances of Delegate, are designed to be immutable; attempts to change them actually create new instances. So, although the compiler would allow syntax such as "MyMulticastDelegate mmd = new MyMulticastDelegate { new EventHandler(MyEventHandler) };", the resulting mmd instance would be empty. The delegate for MyEventHandler was added to a new instance which, although it was returned by the Add method, was discarded by the compiler.

    The situation gets worse when you consider that the class in question may not even have been developed in C#. The developer may be completely unaware of the collection initializer syntax in C# and have no idea of the consequences of his design decision. How could he? It is an entirely language-specific feature, yet it is operating on code developed outside the language. This is one of the consequences of creating a system where languages inter-operate: the languages have to play on common ground, and when one decides to go its own way, problems arise.

    Some might say "Just don't call the method Add; call it Combine or something instead." That would indeed avoid the problem. But it also points out the root of the flaw in this assumption: it should never be the compiler's job to assign semantic meaning to identifier names. Some languages do this. However, it has always been a strength of C-style languages that the developer has complete control over his own code, and can create whatever implementation he deems appropriate to the circumstances. What if Add makes the most sense in my situation, but my class isn't actually a collection? Or what if it is a collection, but I don't want to call the method Add, because Enqueue or Push would be more appropriate?

    Mads points out that this kind of semantic assignment, or what he calls a "pattern based approach," already exists in C#: the foreach statement does not require the target to implement IEnumerable, any GetEnumerator method which returns an IEnumerator instance will do. However, he also notes that "not everybody realizes it." In fact, in my experience most developers don't realize it, and for good reason. It is a foreign concept to C-style development, and it is a slippery slope. Why require the IDisposable interface for the using keyword? Wouldn't any Dispose method do? This is not a good path to head down. Pattern analysis is what static code analysis tools (such as FxCop) are for; it has no place in a compiler.

    Is there a better way? Several comments were made on Mads' blog about potential solutions. One potential solution is to create a keyword that denotes a particular method as a collection initializer. This seems appropriate, since keywords are language-specific, and this is a language-specific feature. It would also deal with problem 2 above, which the pattern based approach does not. However, it has a fatal flaw: since there is no support for this feature in the CLR or BCL, the semantics of the keyword would be lost as soon as the class is compiled. Suggestions were also made that attributes be used, or a new interface be created which has an Add method but not a remove method (ICollector was suggested). These also solve problem 2, although an interface would not solve problem 3. Both would both be better than the pattern approach. However, they would also require changes not just to the C# language, but to the BCL as well. Since collection initialization is a language-specific feature, and the C# language team does not have control over the framework as a whole, this could be problematic. All three of these alternatives also suffer from problem number 1: existing classes would not have the necessary characteristics to participate in collection initialization, severly limiting the usefulness of the feature.

    So what should be done? In my opinion, collection initialization does not add significant benefit to the language. Most of the time, I don't initialize my collections from a static list; that's what arrays are for, and they already have set-based initialization syntax. And dynamic lists can't use collection initialization anyway. Since the only architecturally "correct" ways of implementing the feature would either require a massive update of the BCL or result in a nearly useless feature, my vote is to put the whole thing on hold until it can be done right, and focus instead on more beneficial areas of language development. Interestingly, the most recent published revision of the C# 3.0 language specification (May 2006) still states: "The collection object to which a collection initializer is applied must be of a type that implements System.Collections.Generic.ICollection<T> for exactly one T." This indicates that this was a relatively recent design decision, which hopefully means there is still time to reverse it.

Powered by Community Server, by Telligent Systems