Skip to content

API feedback #42

@parasyte

Description

@parasyte

I started using pest-ast for a programming language and hit many critical design issues:

1. Parsers should never panic.

The design of the derive macro encourages unavoidable panics, demonstrated by the example CSV parser.

Given invalid input:

diff --git a/examples/csv.csv b/examples/csv.csv
index 20a98c9..f4d4042 100644
--- a/examples/csv.csv
+++ b/examples/csv.csv
@@ -1,5 +1,5 @@
 65279,1179403647,1463895090
 3.1415927,2.7182817,1.618034
 -40,-273.15
-13,42
+13,-.42.-
 65537

The example panics within the FromPest impl:

thread 'main' (1902657) panicked at examples/csv.rs:26:21:
called `Result::unwrap()` on an `Err` value: ParseFloatError { kind: Invalid }

This is an acceptable design when panics can be statically proven impossible. Not for a parser that is routinely expected to accept "untrusted" input.

According to what I see in the issue tracker, panicking is absolutely by design. And it appears that more panics may be introduced later (assuming this project will continue evolving).

2. Error handling leaves a lot to be desired.

When the parser/converter fails, I want to present a human readable error message that helpfully highlights what went wrong.

In the case of the CSV example, a rustc-like error message would be superb:

error: Expected `f64`
   --> csv.csv:4:4
  |
4 | 13,-.42.-
       ^^^^^^ help: invalid float literal

It does not appear to be possible to get this level of detail out of ConversionError. The error type does not carry a span that could be used to highlight the erroneous tokens. As far as I can tell, it is not possible to get this level of detail from the FromPest trait. #26 is related.

The derive macro adds another dimension of trouble for error handling: It only uses ConversionError<Void>, eliminating any possibility of returning the FatalError variant. Even worse, the inner(with(...)) and outer(with(...)) attributes cannot call a function that returns Result<T, ConversionError<Void>> for a field with type T.

The only possibility is unwrapping, as the CSV example does. I hopelessly tried with(try!) desperately and naively thinking I could sneak in a propagating unwrap by injecting the function-like macro into the token stream. There is no option to inject the ? operator, though this would be ideal.

My actual use case for error handling was parsing decimal integers. As a concrete example, the following grammar cannot be parsed into u8:

integer = @{ ASCII_DIGIT{1, 3} }

The grammar can constrain the input to a maximum of 3 digits, but that allows numbers out of range such as 999. The only thing for the parser to do is panic!

That leaves me with only one possible solution: Change the type on the AST field and process parsing errors in a second phase. (Maybe a Span<'pest> field can be added in addition to Result<T, ConversionError<Void>> for contextual errors? I didn't try it.) That means that I would end up with both a "half-parsed" AST type, and "parsed and validated" AST type. Which clearly goes against the "parse, don't validate" mantra. But worse than that, it duplicates a lot of code (mostly structural).

3. A few non-critical issues I ran into.

The derive impl for enums does not do what I expect (unit variants are not really handled, especially for terminals; I think #37 hit this case as well). I had to hand-write FromPest impls for all of my enums.

More surprisingly, I could not get the derived impl to match unary expressions at all with the following grammar:

unary = { unary_op* ~ term }

Parsing into type:

#[derive(Debug)]
pub struct Unary<'pest> {
    pub ops: Vec<UnaryOp>,
    pub term: Term<'pest>,
}

The FromPest impl for Vec<T> is greedy. It consumes until it hits the term token and returns Err(NoMatch). I had to write a non-greedy FromPest impl to handle this grammar. #34 looks related.


I know this project is sort of in light maintenance mode, now. Given my experience so far, I cannot see it being realistically usable in serious parsers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions