Common Genius

The online technical home of David Nelson
Welcome to Common Genius Sign in | Join | Help
in Search

Technical Articles

  • OrderedDictionary<T>: A generic implementation of IOrderedDictionary

    Introduction

    This article describes the implementation of the framework's OrderedDictionary, its advantages and disadvantages, and shows how to create a generic collection which implements the IOrderedDictionary interface.

    Ordered Dictionaries

    An ordered dictionary is a collection class in which items can be manipulated by either their index or their key. System.Data.InternalDataCollectionBase, from which data collections such as DataTableCollection and DataColumnCollection are derived, is an example of an ordered dictionary. Items can be accessed either by their name or their index in the collection.

    IOrderedDictionary

    The .NET Framework 2.0 introduced a new interface for ordered dictionary implementations: System.Collections.IOrderedDictionary. Similar to the IList and IDictionary interfaces, the IOrderedDictionary interface defines a consistent contract for ordered dictionary implementations which ensures that items can be accessed, inserted, and removed by index or key.

    OrderedDictionary

    System.Collections.OrderedDictionary is the .NET framework's implementation of the IOrderedDictionary interface. It is implemented by storing elements in two separate internal structures: a Hashtable, which stores the key/values pairs like a normal dictionary, and an ArrayList, which stores DictionaryEntry structures containing the key/value pairs.

    This implementation of an ordered dictionary is very good at lookup operations: the array allows O(1) lookups by index, and the hashtable allows O(1) lookups by key. However, the necessity of keeping the array synchronized with the hashtable means that insert/delete operations have the performance disadvantage of performing those operations on an array (O(n) at worst). There is also, of course, the extra memory requirement of storing both data structures.Because of these disadvantages, OrderedDictionary should only be used when insert/delete operations will be minimal, AND there is a need to efficiently access elements by index and/or key.

    Oddly, although .NET 2.0 introduced generics, and a number of generic collections were included in the framework, there is no generic implementation of the IOrderedDictionary interface. Fortunately, however, creating a generic ordered dictionary is not too difficult.

    Implementing A Generic Ordered Dictionary

    IOrderedDictionary<TKey,TValue>

    IOrderedDictionary<TKey,TValue> is an interface which defines the contract necessary to implement a generic ordered dictionary. It implements the IOrderedDictionary and IDictionary<TKey,TValue> interfaces, and adds a few new members. There are strongly typed Add and Insert methods, and a strongly typed indexer (Item property). Note that the Add method and the indexer hide their base interface counterparts, since they only differ by return types; therefore, classes which implement this interface will have to explicitly implement at least one of each member.

    OrderedDictionary<TKey,TValue>

    OrderedDictionary<TKey,TValue> is an implementation of IOrderedDictionary<TKey,TValue> that uses the same implementation technique as OrderedDictionary. It stores its elements in a Dictionary<TKey,TValue> and a List<KeyValuePair<TKey,TValue>>. The implementation is fairly simple, and most of it is essentially equivalent to the framework's OrderedDictionary, adjusted to work in a strongly-typed environment. This means that OrderedDictionary<TKey,TValue> has the same memory and performance advantages and disadvantages as OrderedDictionary. The ConvertToKeyType and ConvertToValueType methods are the starting point for implementing the weakly-typed members of the non-generic interfaces which the class implements.

    Implementation

    Generally, manipulating the OrderedDictionary by index is faster than manipulating it by key; even if the algorithmic complexity is the same, the actual running time will be less when using an index. This is because every modification to an OrderedDictionary requires accessing both the list and the dictionary; and converting an index to its corresponding key is a O(1) operation (its just a lookup in the list), but converting a key to its corresponding index is a O(n) operation, since it requires a linear search through the list to find the key. Put another way, the list is aware of the dictionary's identifier (the key), but the dictionary is not aware of the list's identifier (the index). Therefore, you should try to manipulate the ordered dictionary by index where possible. NOTE: in the following discussion, accessing the dictionary by key is assumed to be a O(1) operation, although in reality this is dependent on the implementation of the GetHashCode method of the key type.

    Adding/Inserting

    Inserting a new item into the OrderedDictionary is a O(n) operation. The key and value are inserted into the dictionary, and a KeyValuePair object which contains the key and value is inserted into the list at the appropriate point.

    public int Add(TKey key, TValue value)
    {
    	Dictionary.Add(key, value);
    	List.Add(new KeyValuePair<TKey,TValue>(key, value));
    	return Count - 1;
    }
    public void Insert(int index, TKey key, TValue value)
    {
    	if(index > Count || index < 0)
    		throw new ArgumentOutOfRangeException("index");
    
    	Dictionary.Add(key, value);
    	List.Insert(index, new KeyValuePair<TKey, TValue>(key, value));
    }
    			

    Accessing By Index/Key

    Accessing a value in the OrderedDictionary is a O(1) operation. The value is retrieved from either the list (by index) or the dictionary (by key).

    public TValue this[int index]
    {
    	get
    	{
    		return List[index].Value;
    	}
    }
    
    public TValue this[TKey key]
    {
    	get
    	{
    		return Dictionary[key];
    	}
    }
    			

    Setting By Index

    Setting the value of the OrderedDictionary by index replaces the value associated with the key that is currently at that index. This is a O(1) operation: the key at the index is retrieved, and then a new KeyValuePair instance is created and set at the index in the list, and the new value for the key is set in the dictionary.

    public TValue this[int index]
    {
    	set
    	{
    		if(index >= Count || index < 0)
    			throw new ArgumentOutOfRangeException("index", "'index' must be non-negative and less than the size of the collection");
    
    		TKey key = List[index].Key;
    
    		List[index] = new KeyValuePair<TKey, TValue>(key, value);
    		Dictionary[key] = value;
    	}
    }
    			

    Setting By Key

    Setting the value of the OrderedDictionary by key replaces the value at the current index of the key. This is a O(n) operation: the index of the key is located, and then a new KeyValuePair instance is created and set at the index in the list, and the new value for the key is set in the dictionary.

    public TValue this[TKey key]
    {
    	set
    	{
    		if(Dictionary.ContainsKey(key))
    		{
    			Dictionary[key] = value;
    			List[IndexOfKey(key)] = new KeyValuePair<TKey, TValue>(key, value);
    		}
    		else
    		{
    			Add(key, value);
    		}
    	}
    }
    			

    Removing By Index

    Removing from the OrderedDictionary by index is a O(n) operation. The key at the index is retrieved, and then the item at the index in the list is removed, and the key is removed from the dictionary.

    public void RemoveAt(int index)
    {
    	if(index >= Count || index < 0)
    		throw new ArgumentOutOfRangeException("index", "'index' must be non-negative and less than the size of the collection");
    
    	TKey key = List[index].Key;
    
    	List.RemoveAt(index);
    	Dictionary.Remove(key);
    }
    			

    Removing By Key

    Removing from the OrderedDictionary by index is a O(n) operation. The index of the key is located, and then the item at the index in the list is removed, and the key is removed from the dictionary.

    public bool Remove(TKey key)
    {
    	if(null == key)
    		throw new ArgumentNullException("key");
    
    	int index = IndexOfKey(key);
    	if(index >= 0)
    	{
    		if(Dictionary.Remove(key))
    		{
    			List.RemoveAt(index);
    			return true;
    		}
    	}
    	return false;
    }
    			

    Conclusion

    Although it does have some drawbacks, an ordered dictionary is a very versatile and user-friendly data structure, allowing operations to be performed either by index or by key. OrderedDictionary<TKey,TValue> provides a type-safe ordered dictionary implementation.

    This class uses the same implementation technique employed by the framework's OrderedDictionary class. I have looked around for alternative implementation techniques, in .NET or in other languages, but so far I have not found any. If anyone is aware of any alternative implementation techniques, please let me know.

    History

    • 5/1/2007: published
  • Collection Initializers and Duck Typing

    Recently I came across a blog post by Mads Torgersen, the project manager for the C# language team at Microsoft. In it he talks about collection initializers, one of the new language features of the upcoming version 3.0 of C#. Essentially, it allows you to initialize a new collection instance using set-based syntax instead of functional syntax. Check out the post for a more complete description and examples of the feature.

    What most interested me from the post was the method by which the compiler will determine how to add new elements to the collection. It does not use the ICollection<T>.Add method, which would be the most obvious method for a collection initializer. Mads' post touches on one of the reaons why not, and I have added two more:

    1. Classes built prior to .NET 2.0 do not implement ICollection<T>, so they could not use collection initializers, even if they followed the standard collection pattern and implemented a strongly-typed Add method. A large number of the classes in the .NET Base Class Library (BCL) fall into this category (in fact, there are only 14 public instantiable classes which implement ICollection<T>).
    2. Many collection classes can have items arbitarily added to them, but not removed. Examples are Queue, Stack, and XmlNameTable. These classes do not implement ICollection<T>, presumably because the interface requires a Remove(T) method (although the method could have been implemented explicitly, and throw a NotSupportedException; I am not a big fan of this pattern, but it has been used elsewhere in the BCL). Since they don't implement the interface, they would be unable to use collection initializers, even though they behave like collections for the purpose of adding items.
    3. Classes that implement ICollection<T> might prefer to use a method other than Add(T). Dictionaries are the prime example; Dictionary<TKey, TValue> implements ICollection<KeyValuePair<TKey,TValue>> (including an explicitly implemented Add(KeyValuePair<TKey, TValue>) method), but uses Add(TKey, TValue) as its primary Add method, which is more usable.

    The solution outlined for this problem is an application of "duck typing". Basically, the idea is that a class doesn't have to implement ICollection<T> (or even ICollection for that matter) to be treated like a collection for the purposes of this feature; it just has to look like a collection (hence the term "duck typing": "if it walks like a duck, and talks like a duck..."). This results in a new implementation for collection initializers: instead of implementing ICollection<T>, a class must implement IEnumerable<T> and have a public Add method to use the feature. The compiler essentially expands the collection initializer into a series of Add method calls, and then uses the same method resolution rules used for method calls (its actually a little more complicated than that; it will also look for explicit implementations of ICollection.Add, and a few other special cases). Using this approach eliminates problems 1 and 3 in the list above; any public Add method can be used in collection initializer syntax. It seems that this is an improvement over the use of ICollection<T>.

    However, there is an underlying assumption to this implementation which is extremely troublesome, and results in significant drawbacks that, in my opinion, outweigh the positives that the feature brings to the language. The assumption is any class which implements IEnumerable and has a public Add method is a collection.

    I will grant that the vast majority of the time, this will be true. But what about when it isn't? There are some cases where such an assumption could produce some highly non-intuitive results. For example, consider this class, a custom wrapper around Delegate called MyMulticastDelegate. It implements IEnumerable<<T> (where T is Delegate), and has a public Add method. However, instances of MyMulticastDelegate, like instances of Delegate, are designed to be immutable; attempts to change them actually create new instances. So, although the compiler would allow syntax such as "MyMulticastDelegate mmd = new MyMulticastDelegate { new EventHandler(MyEventHandler) };", the resulting mmd instance would be empty. The delegate for MyEventHandler was added to a new instance which, although it was returned by the Add method, was discarded by the compiler.

    The situation gets worse when you consider that the class in question may not even have been developed in C#. The developer may be completely unaware of the collection initializer syntax in C# and have no idea of the consequences of his design decision. How could he? It is an entirely language-specific feature, yet it is operating on code developed outside the language. This is one of the consequences of creating a system where languages inter-operate: the languages have to play on common ground, and when one decides to go its own way, problems arise.

    Some might say "Just don't call the method Add; call it Combine or something instead." That would indeed avoid the problem. But it also points out the root of the flaw in this assumption: it should never be the compiler's job to assign semantic meaning to identifier names. Some languages do this. However, it has always been a strength of C-style languages that the developer has complete control over his own code, and can create whatever implementation he deems appropriate to the circumstances. What if Add makes the most sense in my situation, but my class isn't actually a collection? Or what if it is a collection, but I don't want to call the method Add, because Enqueue or Push would be more appropriate?

    Mads points out that this kind of semantic assignment, or what he calls a "pattern based approach," already exists in C#: the foreach statement does not require the target to implement IEnumerable, any GetEnumerator method which returns an IEnumerator instance will do. However, he also notes that "not everybody realizes it." In fact, in my experience most developers don't realize it, and for good reason. It is a foreign concept to C-style development, and it is a slippery slope. Why require the IDisposable interface for the using keyword? Wouldn't any Dispose method do? This is not a good path to head down. Pattern analysis is what static code analysis tools (such as FxCop) are for; it has no place in a compiler.

    Is there a better way? Several comments were made on Mads' blog about potential solutions. One potential solution is to create a keyword that denotes a particular method as a collection initializer. This seems appropriate, since keywords are language-specific, and this is a language-specific feature. It would also deal with problem 2 above, which the pattern based approach does not. However, it has a fatal flaw: since there is no support for this feature in the CLR or BCL, the semantics of the keyword would be lost as soon as the class is compiled. Suggestions were also made that attributes be used, or a new interface be created which has an Add method but not a remove method (ICollector was suggested). These also solve problem 2, although an interface would not solve problem 3. Both would both be better than the pattern approach. However, they would also require changes not just to the C# language, but to the BCL as well. Since collection initialization is a language-specific feature, and the C# language team does not have control over the framework as a whole, this could be problematic. All three of these alternatives also suffer from problem number 1: existing classes would not have the necessary characteristics to participate in collection initialization, severly limiting the usefulness of the feature.

    So what should be done? In my opinion, collection initialization does not add significant benefit to the language. Most of the time, I don't initialize my collections from a static list; that's what arrays are for, and they already have set-based initialization syntax. And dynamic lists can't use collection initialization anyway. Since the only architecturally "correct" ways of implementing the feature would either require a massive update of the BCL or result in a nearly useless feature, my vote is to put the whole thing on hold until it can be done right, and focus instead on more beneficial areas of language development. Interestingly, the most recent published revision of the C# 3.0 language specification (May 2006) still states: "The collection object to which a collection initializer is applied must be of a type that implements System.Collections.Generic.ICollection<T> for exactly one T." This indicates that this was a relatively recent design decision, which hopefully means there is still time to reverse it.

Powered by Community Server, by Telligent Systems