Sunday, January 2, 2011

Parsing with applicative functors in F#

An interesting (and useful) fact is that all monads are applicative functors. To show an example of this, I'll borrow an example from the book "Real World Haskell" (RWH): parsing an URL-encoded query string. More specifically, we'll just parse hexadecimal escape sequences as they appear in URLs, e.g. %20 is a space. While RWH uses Parsec, we'll use FParsec.

Monadic parsing

We start with a little function to parse a char from two chars representing an hexadecimal ASCII value:

let readHex a b = 
    char <| Byte.Parse(sprintf "%c%c" a b, System.Globalization.NumberStyles.HexNumber)

For example, readHex '6' '5' returns the ASCII character for 0x65, which is 'e'

Now we write a monadic parser for the whole escape sequence:

let p_hex : Parser<char, unit> = 
    parse { 
        do! skipChar '%' 
        let! a = hex 
        let! b = hex 
        return readHex a b 
    }

Let's test it:

match run p_hex "%20" with 
| Success (result,_,_) -> 
    if result = ' ' 
        then printfn "Correctly parsed space" 
        else failwith "Incorrecty parsed %c" result 
| Failure (msg,_,_) -> failwithf "Parsing failed: \n%s" msg

Applicative parsing

Everything's good and dandy so far. Now we'll write the same parser, applicative-style. First we have to define pure and <*> for the parser monad:

let puree = parse.Return 
let (<*>) f a = 
    parse { 
        let! f' = f 
        let! a' = a 
        return f' a' } 

This same definition of pure and <*> applies to every monad.

Here's a first attempt at the applicative parser:

let a_hex : Parser<char, unit> = 
    puree (fun skipCharResult hex1result hex2result -> readHex hex1result hex2result) <*> skipChar '%' <*> hex <*> hex

Note how this mimics the monadic parser quite closely, the only major difference is that readHex is lifted at the beginning instead of at the end. But we can do better than this. For example, we can use <!> (a.k.a. map a.k.a. lift) as defined in the previous post:

let (<!>) f a = puree f <*> a

so we can now write:

let a_hex2 : Parser<char, unit> = 
    (fun skipCharResult hex1result hex2result -> readHex hex1result hex2result) <!> skipChar '%' <*> hex <*> hex

Having to capture the result of skipChar, only to ignore it later (it's unit anyway), is pretty annoying. In the monadic parser we used do! to ignore the unit result. Can we do something similar for our applicative parser?

What we need here is a combinator that given two parsers a and b, runs both, but returns only the result of b. Something like:

let ( *>) a b = puree (fun resultA resultB -> resultB) <*> a <*> b

( *>) : Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u>

Using this new combinator we can write:

let a_hex3 : Parser<char, unit> = 
    (fun hex1result hex2result -> readHex hex1result hex2result) <!> skipChar '%' *> hex <*> hex

We could similarly define (<*) (although we won't use it here):

let (<*) a b = puree (fun resultA resultB -> resultA) <*> a <*> b

Now we can drop the arguments from the readHex call (eta reduction):

let a_hex4 : Parser<char, unit> = 
    readHex <!> skipChar '%' *> hex <*> hex 

That's it for our applicative parser. It's more compact than the equivalent monadic parser, and still readable (once you learn how to read applicative-style code, that is).

Of course, FParsec includes lots of combinators, and we could also have written:

let a_hex5 : Parser<char, unit> = 
    readHex |> pipe2 (skipChar '%' >>. hex) hex 

And in fact most of the combinators we defined above are mostly aliases to built-in combinators in FParsec:

let inline (<*>) f a = f >>= fun f' -> a >>= fun a' -> preturn (f' a') 
let inline lift f a = a |>> f 
let inline (<!>) f a = lift f a 
let inline lift2 f a b = pipe2 a b f  //preturn f <*> a <*> b 
let inline ( *>) x y = x >>. y // lift2 (fun _ z -> z) x y 
let inline ( <*) x y = x .>> y // lift2 (fun z _ -> z) x y 

These definitions are more efficient than the first ones because they bind directly to FParsec primitives, but they're specific to FParsec, while the first definitions can be applied to any monad.

The power of monads and applicatives

Recall the signature of (<*>) :

(<*>) : Parser<'a -> 'b, 's> -> Parser<'a, 's> -> Parser<'b, 's>

Now let's flip the arguments (without changing semantics) and compare it to (>>=) (monadic bind):

(<*>) : Parser<'a, 's> -> Parser<'a -> 'b, 's>   -> Parser<'b, 's> (flipped) 
(>>=) : Parser<'a, 's> -> ('a -> Parser<'b, 's>) -> Parser<'b, 's>

See the difference in the second argument? This is the fundamental difference between monads and applicative functors: monads can depend on the results of previous computations to sequence its actions, while applicative functors cannot. This is what makes monads more powerful than applicative functors.

In the particular case of parsers, it means that monadic parsers are able to parse context-sensitive grammars, while applicative parsers can only parse context-free grammars. Daan Leijen and Erik Meijer explain this in detail in their paper "Parsec: Direct Style Monadic Parser Combinators for the Real World". EDIT: this has been disproven, see comments below.

So if monads are more powerful than applicative functors, does that mean that we should always go with monads? Quite the opposite. Generally, one should choose the least powerful abstraction able to model a problem. Less powerful abstractions are usually simpler to use and understand, more composable, more reusable, more maintainable. But I don't want to digress, the Principle of Least Power is a complex and controversial topic. In this particular case, we've shown that applicative style leads to more compact, functional code. Monadic code is usually written using syntax sugar ("do notation" in Haskell, computation expressions in F#) since otherwise it's somewhat hard to follow, which leads to imperative-style code. There are many articles about avoiding monadic syntax sugar in Haskell. Also, in the particular case of parsing combinators, mixing combinators and monadic syntax sugar can look somewhat weird.

Conor McBride and Ross Paterson sum it up nicely in their paper "Applicative Programming with Effects": "[...]if you've got an Applicative functor, that's good; if you've also got a Monad, that's even better! And the dual [...] is this: if you want a Monad, that's good; if you only want an Applicative functor, that's even better!"

The code shown in this post is here.

In the next post we'll see how applicative functors compose, and an application to web programming.

9 comments:

tomas.K said...

Nice post, looking forward to next!

tomas.K said...

Hi,

I stuck on different evaluation order of a_hex2 and a_hex3. In a_hex2 it first evaluates:
((fun skipCharResult hex1result hex2result -> readHex hex1result hex2result) < ! > skipChar '%')

but in a_hex3 it first evaluates:
(skipChar '%' *> hex)

So I see different associativity, so how this work in F# ?

Thank you

Mauricio Scheffer said...

If you try to force a_hex3 into the same evaluation order of a_hex2 (by using parens) you'll see that this doesn't type:
((fun hex1result hex2result -> readHex hex1result hex2result) < ! > skipChar '%') *> hex <*> hex
because the pure function expects a char, but skipChar produces unit.

Still, generally, since we can't define operator precedence in F# as you can in Haskell, sometimes you'll have to use parens to force the evaluation order, where in the equivalent Haskell code the defined operator precedence takes care of that.

tomas.K said...

Yes, this is what I tried. It seemed to me that the a_hex3 won't compile, but is ok. I thought there is a strict rule of associativity of function evaluation, which is same in a_hex2 as in a_hex3, but there is not. F# is enough clever to choose different evaluation order. So my rather study question is what rule is driving this evaluation order differences?

Mauricio Scheffer said...

@tomas.K: There is a strictly defined operator precedence in the spec, and '*' has a higher precedence than '<', so '*>' has a higher precedence than '< ! >'

tomas.K said...

Yeah, that's what I was missing. I didn't know this precedence is counted in case of custom operator functions. It seems like quite hidden feature to me...

Cheers

Unknown said...
This comment has been removed by the author.
Unknown said...

"In the particular case of parsers, it means that monadic parsers are able to parse context-sensitive grammars, while applicative parsers can only parse context-free grammars"

Here (http://byorgey.wordpress.com/2012/01/05/parsing-context-sensitive-languages-with-applicative/) is said that applicative parsers can parse context-sensitive grammars.

Mauricio Scheffer said...

@Denys : That's interesting, thanks for the pointer. So that disproves Leijen/Meijer's assertion (just re-read the paper, it doesn't really prove that proposition formally).