Thursday, February 23, 2012

Static upcast in C#

I was rather surprised to realize only recently, after using C# for so many years, that it doesn't have a proper static upcast operator. By "static upcast operator" I mean a built-in language operator or a function that upcasts with a static (i.e. compile-time) check.

C# actually does implicit upcasting and most people probably don't even realize it. Consider this simple example:

Stream Fun() {
    return new MemoryStream();
}

Whereas in F# we have to do this upcast explicitly, or we get a compile-time error:

let Fun () : Stream = 
    upcast new MemoryStream()

The reason being that type inference is problematic in the face of subtyping [1].

Now how does this interact with parametric polymorphism (generics)?

C# 4.0 introduced variant interfaces, so we can write:

IEnumerable<IEnumerable<Stream>> Fun() {
    return new List<List<MemoryStream>>();
}

Note that covariance is not implicit upcasting: List<List<MemoryStream>> is not a subtype of IEnumerable<IEnumerable<Stream>>.

But this doesn't compile in C# 3.0, requiring conversions instead. When the supertypes are invariant we have to start converting. Even in C# 4.0 if you target .NET 3.5 the above snippet does not compile because System.Collections.Generic.IEnumerable<T> isn't covariant in T. And even in C# 4.0 targeting .NET 4.0 this doesn't compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>();
} 

because ICollection<T> isn't covariant in T. It's not covariant for good reason: it contains mutators (i.e. methods that mutate the object implementing the interface), so making it covariant would make the type system unsound (actually, this already happens in C# and Java) [2][3].

A programmer new to C# might try the following to appease the compiler (ReSharper suggests this so it must be ok? UPDATE: I submitted this bug and ReSharper fixed it.):

ICollection<ICollection<Stream>> Fun() {
    return (ICollection<ICollection<Stream>>)new List<List<MemoryStream>>();
}

(attempt #1)

It compiles! But upon running the program, our C# learner is greeted with an InvalidCastException.

The second suggestion on ReSharper says "safely cast as...":

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>() as ICollection<ICollection<Stream>>;
}

(attempt #2)

And sure enough, it's safe since it doesn't throw, but all he gets is a null.

So our hypothetical developer googles a bit and learns about Enumerable.Cast<T>(), so he tries:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>()
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #3)

Yay, no errors! Ok, let's add elements to this list:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #4)

Oh my, InvalidCastException is back...

Determined to make this work, he learns a bit more about LINQ and gets this to compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => (ICollection<Stream>)x).ToList();
}

(attempt #5)

But gets another InvalidCastException. He forgot to convert the inner list! He tries again:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Select(x => (ICollection<Stream>)x.Select(y => (Stream)y).ToList()).ToList();
}

(attempt #6)

This (finally!) works as expected.

Experienced C# programmers are probably laughing now at these obvious mistakes, but there are two non-trivial lessons to learn here:

  1. Avoid applying Enumerable.Cast<T>() to IEnumerable<U> (for T,U != object). Indeed, Enumerable.Cast<T>() is the source of many confusions, even unrelated to subtyping [4] [5] [6] [7] [8], and yet often poorly advised [9] [10] [11] [12] [13] [14] since it's essentially not type-safe. Cast<T>() will happily try to cast any type into any other type without any compiler check.
    Other than bringing a non-generic IEnumerable into an IEnumerable<T>, I don't think there's any reason to use Cast<T>() on an IEnumerable<U>.
    The same argument can be applied to OfType<T>().
  2. It's easy to get casting wrong (not as easy as in C, but still), particularly when working with complex types (where the definition of 'complex' depends on each programmer), when the compiler checks aren't strict enough (here's a scenario that justifies why C# allows seemingly 'wrong' casts as in attempt #5).

Note how in attempt #6 the conversion involves three upcasts:

  • MemoryStream -> Stream (explicit through casting)
  • List<Stream> -> ICollection<Stream> (explicit through casting)
  • List<ICollection<Stream>> -> ICollection<ICollection<Stream>> (implicit)

What we could use here is a static upcast operator, a function that only does upcasts and no other kind of potentially unsafe casts, that doesn't let us screw things up no matter what types we feed it. It should catch any invalid upcast at compile-time. But as I said at the beginning of the post, this doesn't exist in C#. It's easily doable though:

static U Upcast<T, U>(this T o) where T : U {
    return o;
}

With this we can write:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => x.Select(y => y.Upcast<MemoryStream, Stream>()).ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

You may object that this is awfully verbose. Maybe so, but you can't screw this up no matter what types you change. The verbosity stems from the lack of type inference in C#. You may also want to lift this to operate on IEnumerables to make it a bit shorter, e.g:

static IEnumerable<U> SelectUpcast<T, U>(this IEnumerable<T> o) where T : U {
    return o.Select(x => x.Upcast<T, U>());
}
ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => x.SelectUpcast<Stream, Stream>().ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

Alternatively, we could have used explicitly typed variables to avoid casts:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => {
        ICollection<Stream> l = x.Select((Stream s) => s).ToList();
        return l;
    }).ToList();
}

I mentioned before that F# has a static upcast operator (actually two, one explicit/coercing and one inferencing operator). Here's what the same Fun() looks like in F#:

let Fun(): ICollection<ICollection<Stream>> = 
    List [ List [ new MemoryStream() ]]
    |> Seq.map (fun x -> List (Seq.map (fun s -> s :> Stream) x) :> ICollection<_>)
    |> Enumerable.ToList
    |> fun x -> upcast x

Now if you excuse me, I have to go replace a bunch of casts... ;-)

References

Friday, February 17, 2012

Watching all github forks

I like watching what other people do with my code. It often gives me ideas about how to improve the API or what features to add. I can also get a feeling of what the user thinks of my code: if he uses it differently it's an indicator that he either doesn't agree with my design choices (whether they were conscious or not!) or he's struggling to understand some of the concepts, which I may have left implicit or undocumented. In a nutshell: a different perspective.

For code published on github, this is easy to do: just watch the forks. But I always seem to miss some fork, so here's a script that watches all forks in a github network:

You can get your github API token from https://github.com/settings/admin

Going further, I like doing these reviews from gitk. So here's another script that adds all repositories in a network as remotes:

I then run a "git fetch --all" and browse the commits in gitk.

Of course, this works on networks with relatively few forks or not a lot of activity. YMMV.