|
| 1 | += lexy |
| 2 | + |
| 3 | +ifdef::env-github[] |
| 4 | +image:https://img.shields.io/endpoint?url=https%3A%2F%2Fwww.jonathanmueller.dev%2Fproject%2Flexy%2Findex.json[Project Status,link=https://www.jonathanmueller.dev/project/] |
| 5 | +image:https://github.com/foonathan/lexy/workflows/Main%20CI/badge.svg[Build Status] |
| 6 | +image:https://img.shields.io/badge/try_it_online-blue[Playground,link=https://lexy.foonathan.net/playground] |
| 7 | +endif::[] |
| 8 | + |
| 9 | +lexy is a parser combinator library for {cpp}17 and onwards. |
| 10 | +It allows you to write a parser by specifying it in a convenient {cpp} DSL, |
| 11 | +which gives you all the flexibility and control of a handwritten parser without any of the manual work. |
| 12 | + |
| 13 | +ifdef::env-github[] |
| 14 | +*Documentation*: https://lexy.foonathan.net/[lexy.foonathan.net] |
| 15 | +endif::[] |
| 16 | + |
| 17 | +.IPv4 address parser |
| 18 | +-- |
| 19 | +ifndef::env-github[] |
| 20 | +[.godbolt-example] |
| 21 | +.+++<a href="https://godbolt.org/z/scvajjE17", title="Try it online">{{< svg "icons/play.svg" >}}</a>+++ |
| 22 | +endif::[] |
| 23 | +[source,cpp] |
| 24 | +---- |
| 25 | +namespace dsl = lexy::dsl; |
| 26 | + |
| 27 | +// Parse an IPv4 address into a `std::uint32_t`. |
| 28 | +struct ipv4_address |
| 29 | +{ |
| 30 | + // What is being matched. |
| 31 | + static constexpr auto rule = []{ |
| 32 | + // Match a sequence of (decimal) digits and convert it into a std::uint8_t. |
| 33 | + auto octet = dsl::integer<std::uint8_t>; |
| 34 | + |
| 35 | + // Match four of them separated by periods. |
| 36 | + return dsl::times<4>(octet, dsl::sep(dsl::period)) + dsl::eof; |
| 37 | + }(); |
| 38 | + |
| 39 | + // How the matched output is being stored. |
| 40 | + static constexpr auto value |
| 41 | + = lexy::callback<std::uint32_t>([](std::uint8_t a, std::uint8_t b, std::uint8_t c, std::uint8_t d) { |
| 42 | + return (a << 24) | (b << 16) | (c << 8) | d; |
| 43 | + }); |
| 44 | +}; |
| 45 | +---- |
| 46 | +-- |
| 47 | +
|
| 48 | +== Features |
| 49 | +
|
| 50 | +Full control:: |
| 51 | + * *Describe the parser, not some abstract grammar*: |
| 52 | + Unlike parser generators that use some table driven magic for parsing, lexy's grammar is just syntax sugar for a hand-written recursive descent parser. |
| 53 | + The parsing algorithm does exactly what you've instructed it to do -- no more ambiguities or weird shift/reduce errors! |
| 54 | + * *No implicit backtracking or lookahead*: |
| 55 | + It will only backtrack when you say it should, and only lookahead when and how far you want it. |
| 56 | + Don't worry about rules that have side-effects, they won't be executed unnecessarily thanks to the user-specified lookahead conditions. |
| 57 | + https://lexy.foonathan.net/playground?example=peek[Try it online]. |
| 58 | + * *Escape hatch for manual parsing*: |
| 59 | + Sometimes you want to parse something that can't be expressed easily with lexy's facilities. |
| 60 | + Don't worry, you can integrate a hand-written parser into the grammar at any point. |
| 61 | + https://lexy.foonathan.net/playground/?example=scan[Try it online]. |
| 62 | + * *Tracing*: |
| 63 | + Figure out why the grammar isn't working the way you want it to. |
| 64 | + https://lexy.foonathan.net/playground/?example=trace&mode=trace[Try it online]. |
| 65 | +
|
| 66 | +Easily integrated:: |
| 67 | + * *A pure {cpp} DSL*: |
| 68 | + No need to use an external grammar file; embed the grammar directly in your {cpp} project using operator overloading and functions. |
| 69 | + * *Bring your own data structures*: |
| 70 | + You can directly store results into your own types and have full control over all heap allocations. |
| 71 | + * *Fully `constexpr` parsing*: |
| 72 | + You want to parse a string literal at compile-time? You can do so. |
| 73 | + * *Minimal standard library dependencies*: |
| 74 | + The core parsing library only depends on fundamental headers such as `<type_traits>` or `<cstddef>`; no big includes like `<vector>` or `<algorithm>`. |
| 75 | + * *Header-only core library* (by necessity, not by choice -- it's `constexpr` after all). |
| 76 | +
|
| 77 | +ifdef::env-github[Designed for text::] |
| 78 | +ifndef::env-github[Designed for text (e.g. {{< github-example json >}}, {{< github-example xml >}}, {{< github-example email >}}) ::] |
| 79 | + * *Unicode support*: parse UTF-8, UTF-16, or UTF-32, and access the Unicode character database to query char classes or perform case folding. |
| 80 | + https://lexy.foonathan.net/playground?example=identifier-unicode[Try it online]. |
| 81 | + * *Convenience*: |
| 82 | + Built-in rules for parsing nested structures, quotes and escape sequences. |
| 83 | + https://lexy.foonathan.net/playground?example=parenthesized[Try it online]. |
| 84 | + * *Automatic whitespace skipping*: |
| 85 | + No need to manually handle whitespace or comments. |
| 86 | + https://lexy.foonathan.net/playground/?example=whitespace_comment[Try it online]. |
| 87 | +
|
| 88 | +ifdef::env-github[Designed for programming languages::] |
| 89 | +ifndef::env-github[Designed for programming languages (e.g. {{< github-example calculator >}}, {{< github-example shell >}})::] |
| 90 | + * *Keyword and identifier parsing*: |
| 91 | + Reserve a set of keywords that won't be matched as regular identifiers. |
| 92 | + https://lexy.foonathan.net/playground/?example=reserved_identifier[Try it online]. |
| 93 | + * *Operator parsing*: |
| 94 | + Parse unary/binary operators with different precedences and associativity, including chained comparisons `a < b < c`. |
| 95 | + https://lexy.foonathan.net/playground/?example=expr[Try it online]. |
| 96 | + * *Automatic error recovery*: |
| 97 | + Log an error, recover, and continue parsing! |
| 98 | + https://lexy.foonathan.net/playground/?example=recover[Try it online]. |
| 99 | +
|
| 100 | +ifdef::env-github[Designed for binary input::] |
| 101 | +ifndef::env-github[Designed for binary input (e.g. {{< github-example protobuf >}})::] |
| 102 | + * *Bytes*: Rules for parsing `N` bytes or Nbit big/little endian integer. |
| 103 | + * *Bits*: Rules for parsing individual bit patterns. |
| 104 | + * *Blobs*: Rules for parsing TLV formats. |
| 105 | +
|
| 106 | +== FAQ |
| 107 | +
|
| 108 | +Why should I use lexy over XYZ?:: |
| 109 | + lexy is closest to other PEG parsers. |
| 110 | + However, they usually do more implicit backtracking, which can hurt performance and you need to be very careful with rules that have side-effects. |
| 111 | + This is not the case for lexy, where backtracking is controlled using branch conditions. |
| 112 | + lexy also gives you a lot of control over error reporting, supports error recovery, special support for operator precedence parsing, and other advanced features. |
| 113 | +
|
| 114 | + http://boost-spirit.com/home/[Boost.Spirit]::: |
| 115 | + The main difference: it is not a Boost library. |
| 116 | + Otherwise, it is just a different implementation with a different flavor. |
| 117 | + Use lexy if you like lexy more. |
| 118 | + https://github.com/taocpp/PEGTL[PEGTL]::: |
| 119 | + PEGTL is very similar and was a big inspiration. |
| 120 | + The biggest difference is that lexy uses an operator based DSL instead of inheriting from templated classes as PEGTL does; |
| 121 | + depending on your preference this can be an advantage or disadvantage. |
| 122 | + Hand-written Parsers::: |
| 123 | + Writing a handwritten parser is more manual work and error prone. |
| 124 | + lexy automates that away without having to sacrifice control. |
| 125 | + You can use it to quickly prototype a parser and then slowly replace more and more with a handwritten parser over time; |
| 126 | + mixing a hand-written parser and a lexy grammar works seamlessly. |
| 127 | +
|
| 128 | +How bad are the compilation times?:: |
| 129 | +They're not as bad as you might expect (in debug mode, that is). |
| 130 | ++ |
| 131 | +The example JSON parser compiles in about 2s on my machine. |
| 132 | +If we remove all the lexy specific parts and just benchmark the time it takes for the compiler to process the datastructure (and stdlib includes), |
| 133 | +that takes about 700ms. |
| 134 | +If we validate JSON only instead of parsing it, so remove the data structures and keep only the lexy specific parts, we're looking at about 840ms. |
| 135 | ++ |
| 136 | +Keep in mind, that you can fully isolate lexy in a single translation unit that only needs to be touched when you change the parser. |
| 137 | +You can also split a lexy grammar into multiple translation units using the `dsl::subgrammar` rule. |
| 138 | +
|
| 139 | +How bad are the {cpp} error messages if you mess something up?:: |
| 140 | + They're certainly worse than the error message lexy gives you. |
| 141 | + The big problem here is that the first line gives you the error, followed by dozens of template instantiations, which end at your `lexy::parse` call. |
| 142 | + Besides providing an external tool to filter those error messages, there is nothing I can do about that. |
| 143 | +
|
| 144 | +How fast is it?:: |
| 145 | + Benchmarks are available in the `benchmarks/` directory. |
| 146 | + A sample result of the JSON validator benchmark which compares the example JSON parser with various other implementations is available https://lexy.foonathan.net/benchmark_json/[here]. |
| 147 | +
|
| 148 | +Why is it called lexy?:: |
| 149 | + I previously had a tokenizer library called foonathan/lex. |
| 150 | + I've tried adding a parser to it, but found that the line between pure tokenization and parsing has become increasingly blurred. |
| 151 | + lexy is a re-imagination on of the parser I've added to foonathan/lex, and I've simply kept a similar name. |
| 152 | +
|
| 153 | +ifdef::env-github[] |
| 154 | +== Documentation |
| 155 | +
|
| 156 | +The documentation, including tutorials, reference documentation, and an interactive playground can be found at https://lexy.foonathan.net/[lexy.foonathan.net]. |
| 157 | +
|
| 158 | +A minimal `CMakeLists.txt` that uses lexy can look like this: |
| 159 | +
|
| 160 | +.`CMakeLists.txt` |
| 161 | +```cmake |
| 162 | +project(lexy-example) |
| 163 | +
|
| 164 | +include(FetchContent) |
| 165 | +FetchContent_Declare(lexy URL https://lexy.foonathan.net/download/lexy-src.zip) |
| 166 | +FetchContent_MakeAvailable(lexy) |
| 167 | +
|
| 168 | +add_executable(lexy_example) |
| 169 | +target_sources(lexy_example PRIVATE main.cpp) |
| 170 | +target_link_libraries(lexy_example PRIVATE foonathan::lexy) |
| 171 | +``` |
| 172 | +
|
| 173 | +endif::[] |
| 174 | +
|
0 commit comments