Jump to content

User:Mathepic/Parsec (parser)

From Wikipedia, the free encyclopedia

Parsec is an industrial-strength library for monadic parser combinators, written in Haskell. It includes builtin support for various options on parsers on a stream of characters, the most common type of parser, as well as combinators for parsers on a a stream of any type.

Versions

[edit]

There are two "mainline" versions of Parsec in use - 2 and 3. Version 1 is either non existent or completely unsupported, as it is not listed as a version on Hackage.[1]

Version 2

[edit]

Version 2, despite being older, is probably used more than Parsec 3. Most tutorials target Parsec 3, such as the one from Real World Haskell. It uses the modules under Text.ParserCombinators.Parsec, defining the main types Parser and GenParser. It is flexible enough to be able to operate on lists of things other than Char's, using GenParser, which also allows for using state during parsing. However, the idea that items of type a must come from lists, [a], is more limiting than it seems. For example, one may not use ByteStrings to parse on characters.

--

Versions 3

[edit]

Version 3 evolves on 2 by no longer supporting the idea of Parser and GenParser. Rather, it creates a type ParsecT, which is of course a monad transformer rather than a monad. Hence, it is possible to create a monad stack using Parsers.

The other important change is the creation of the Stream typeclass. This can clutter up typing a bit, but it allows for more polymorphic functions. It allows the programmer to define what types can generate what other types - for instance, ByteStrings are able to generate Chars.

Because of the large volume of Parsec 2 code, Parsec 3 exports a compatibility module based on its own hierarchy, Text.Parsec.

Features

[edit]

Backtracking

[edit]

For efficiency, if a parser is able to parse any input at all, it will fail. This can cause errors in code like:

eol :: Stream s m Char => ParsecT s u m String
eol = string "\r\n" <|> string "\r" <|> string "\n" <?> "end of line"

In this parser, if it receives "\rhello" as input, the first parser will succeed because it will consume the "\r". Hence, the code should be:

eol :: Stream s m Char => ParsecT s u m String
eol = try (string "\r\n") <|> string "\r" <|> string "\n" <?> "end of line"

The implicit separating, and the default of not allowing backtracking, increases the speed of the parser.

State Tracking

[edit]

It is possible for state to be kept track of during parsing. For example, one might want to keep track of what variables are defined in a programming language to catch a compilation error during the parse phase, giving the error back to the programmer sooner rather than later. This state backtracks - if the parser fails in a try clause, it can backtrack to the previous state. If this behavior is unwanted, `MonadState s m => ParsecT s' u m` is an instance of MonadState, so the programmer can access a non backtracking state by using StateT or State. Typeclass instances for other monads from the standard mtl library are provided as well.

Error Messages

[edit]

Parsec's api is designed so that it is easy to create simple parsing error messages that are descriptive and indicate location. For example, one may use the operator <?> to define the error message of a parser.[2]

References

[edit]
  1. ^ Leijen, Daan. "HackageDB: parsec-3.1.1".
  2. ^ Leijen, Daan. "HackageDB: parsec-3.1.1".
[edit]