No More Disorder in the House

(If you’re looking for a great replacement for OrderBy[Descending]/ThenBy[Descending], and want to skip all the exposition, just scroll down to MAGIC SAUCE.)

Y’all know that I love LINQ: Microsoft really outdid themselves with this library. It’s powerful, flexible, extensible, and readable.

However, one design decision seems to trip me up at every turn: ordering enumerations. As you probably know, there are four LINQ extensions for ordering a generic enumeration: OrderBy, OrderByDescending, ThenBy, and ThenByDescending.

For our examples, let’s assume we have a collection of Foo instances called stuff, and that class Foo has Name and Color properties.

Consider the following:

  var sortedStuff = stuff
    .OrderBy( x => x.Name )
    .OrderBy( x => x.Color );

If you expected sortedStuff to be sorted by name first and then by color, you are in for a rude surprise.  sortedStuff will be sorted by color.  Every time you call OrderBy, the enumeration is re-ordered.  Happily, since LINQ generally defers execution, this will not usually result in multiple sorting operations (instead, every sort operation but the last will be discarded), but it still doesn’t do what wanted in the first place.

Instead, LINQ expects you to do this:

  var sortedStuff = stuff
    .OrderBy( x => x.Name )
    .ThenBy( x => x.Color );

To LINQ’s credit, this is certainly readable.  It’s hard to be confused by what’s meant here.  However, in practice, this has tripped me up time and time again.

One of the beauties of LINQ is that it is deferred, and I can programatically construct queries as I go without worrying about invoking unnecessary processing (or, worse, unnecessary database access).  A great and common example of that is creating RESTful APIs that can return data pre-sorted and paginated.  For example, this is a very common pattern:


  public JsonResult GetFoos( string sortBy, int page=1, int pageSize=100 ) {
    // safety first!
    if( page < 1 ) throw new ArgumentException( "Invalid page number.", "page" );
    if( pageSize < 1 ) throw new ArgumentException( "Invalid page size.", "pageSize" );
    IEnumerable<Foo> foos = DataContext.Foos;
    // "sortBy" is a string that contains the fields I want to sort by, comma separated,
    // and prefixed by an exclamation point to sort descending.  for example, "color,!name"
    // would sort first by color, then by name (descending).

    // ... DO SORTING HERE

    // return paginated data    
    return Json( foos.Skip( (page-1)*pageSize ).Take( pageSize ) );
}

It seems relatively straightforward, right?  But look what I have to do to actually make this work:


  public JsonResult GetFoos( string sortBy, int page=1, int pageSize=100 ) {
    // safety first!
    if( page < 1 ) throw new ArgumentException( "Invalid page number.", "page" );
    if( pageSize < 1 ) throw new ArgumentException( "Invalid page size.", "pageSize" );
    IEnumerable<Foo> foos = DataContext.Foos;
    // "sortBy" is a string that contains the fields I want to sort by, comma separated,
    // and prefixed by an exclamation point to sort descending.  for example, "color,!name"
    // would sort first by color, then by name (descending).

    if( sortBy != null ) {
      var parts = sortBy.Split( new[]{','} );
      // handle first field
      var desc = parts[0].StartsWith( "!" );
      var field = parts[0].TrimStart( new[]{'!'} ).ToLowerInvariant();
      switch( field ) {
        case "name": 
          foos = desc ? foos.OrderByDescending( x => x.Name )
            : foos.OrderBy( x => x.Name );
          break;
        case "color": 
          foos = desc ? foos.OrderByDescending( x => x.Color )
            : foos.OrderBy( x => x.Color );
          break;
        default:
          throw new ArgumentException( "Invalid sort field: " + field, "orderBy" );
      }
      // handle the rest of the fields
      for( var i=1; i<parts.Count(); i++ ) {
        var desc = parts[i].StartsWith( "!" );
        var field = parts[i].TrimStart( new[]{'!'} ).ToLowerInvariant();
        switch( field ) {
          case "name": 
            foos = desc ? foos.ThenByDescending( x => x.Name )
              : foos.ThenBy( x => x.Name );
            break;
          case "color": 
            foos = desc ? foos.ThenByDescending( x => x.Color )
              : foos.ThenBy( x => x.Color );
            break;
          default:
            throw new ArgumentException( "Invalid sort field: " + field, "orderBy" );
        }
      }
    }

    // return paginated data    
    return Json( foos.Skip( (page-1)*pageSize ).Take( pageSize ) );
}

Whew! That’s thirty-five lines just to handle the sorting. Worse than that, because of the split between OrderBy and ThenBy, we have to treat the first field different from all the other fields, leading to duplicated code. Not only is that code not very DRY, the repeated code has a high probability of being changed in the future. Every time you want to allow sorting by a new field, you have to remember to modify it in two places. To me, this is horrifying.

What I really want is a less fussy OrderBy method. First, I want it to “do what I mean” when I call it more than once on the same enumeration. Secondly, I find the whole OrderBy/OrderByDescending difference be very tedious: I wish that were parameterized. If only Microsoft had built it this way to start with, my life would be complete!

After shaking my fist theatrically in the general direction of Redmond, I say to myself “whoah there, lazybones: mightn’t you do something about this problem yourself instead of impotently complaining about an otherwise great library?” As it turns out, implementing what I want is not really that hard.

The implementation relies on a very subtle fact: the LINQ sorting methods return not a IEnumerable<T> but a IOrderedEnumerable<T>. Armed with that knowledge, you can write your own fancy extension:


MAGIC SAUCE


        // you seriously need this method in your toolbox.
        // i'm not even joking.
        public static IEnumerable<TSource> ThenOrderBy<TSource, TKey>( this IEnumerable<TSource> l, Func<TSource, TKey> keySelector, bool descending = false ) {
            var ordered = l as IOrderedEnumerable<TSource>;
            if( ordered != null ) return descending ?
              ordered.ThenByDescending( keySelector ) : 
              ordered.ThenBy( keySelector );
            return descending ? 
              l.OrderByDescending( keySelector ) : 
              l.OrderBy( keySelector );
        }

AWESOME.

Now look how compact and maintainable my REST action is:


  public JsonResult GetFoos( string sortBy, int page=1, int pageSize=100 ) {
    // safety first!
    if( page &lt; 1 ) throw new ArgumentException( "Invalid page number.", "page" );
    if( pageSize &lt; 1 ) throw new ArgumentException( "Invalid page size.", "pageSize" );
    IEnumerable&lt;Foo&gt; foos = DataContext.Foos;
    // "sortBy" is a string that contains the fields I want to sort by, comma separated,
    // and prefixed by an exclamation point to sort descending.  for example, "color,!name"
    // would sort first by color, then by name (descending).

    if( sortBy != null ) {
      foreach( var expr in sortBy.Split( new[]{','} ) ) {
        var desc = parts[0].StartsWith( "!" );
        var field = parts[0].TrimStart( new[]{'!'} ).ToLowerInvariant();
      switch( field ) {
        case "name": foos.ThenOrderBy( x => x.Name, desc ); break;
        case "color": foos.ThenOrderBy( x => x.Color, desc ); break;
        default: throw new ArgumentException( "Invalid sort field: " + field, "orderBy" );
      }
    }

    // return paginated data    
    return Json( foos.Skip( (page-1)*pageSize ).Take( pageSize ) );
}

My sorting code has gone from 35 lines to 10. It’s more readable AND more maintainable AND I have a handy extension method that I can use again in similar circumstances. Time to go home and enjoy a hard-earned beer.


No Comments on No More Disorder in the House

Comments on this entry are closed.