Text Blame History Raw

ocaml-markup

Markup.ml is a pair of parsers implementing the HTML5 and XML specifications, including error recovery. Usage is simple, because each parser is a function from byte streams to parsing signal streams.

In addition to being error-correcting, the parsers are: - streaming: parsing partial input and emitting signals while more input is still being received; - lazy: not parsing input unless you have requested the next parsing signal, so you can easily stop parsing part-way through a document; - non-blocking: they can be used with Lwt, but still provide a straightforward synchronous interface for simple usage; and - one-pass: memory consumption is limited since the parsers don't build up a document representation, nor buffer input beyond a small amount of lookahead.

The parsers detect character encodings automatically, and emit everything in UTF-8. The HTML parser understands SVG and MathML, in addition to HTML5.