-
Notifications
You must be signed in to change notification settings - Fork 17
Description
I started using pest-ast for a programming language and hit many critical design issues:
1. Parsers should never panic.
The design of the derive macro encourages unavoidable panics, demonstrated by the example CSV parser.
Given invalid input:
diff --git a/examples/csv.csv b/examples/csv.csv
index 20a98c9..f4d4042 100644
--- a/examples/csv.csv
+++ b/examples/csv.csv
@@ -1,5 +1,5 @@
65279,1179403647,1463895090
3.1415927,2.7182817,1.618034
-40,-273.15
-13,42
+13,-.42.-
65537The example panics within the FromPest impl:
thread 'main' (1902657) panicked at examples/csv.rs:26:21:
called `Result::unwrap()` on an `Err` value: ParseFloatError { kind: Invalid }
This is an acceptable design when panics can be statically proven impossible. Not for a parser that is routinely expected to accept "untrusted" input.
According to what I see in the issue tracker, panicking is absolutely by design. And it appears that more panics may be introduced later (assuming this project will continue evolving).
2. Error handling leaves a lot to be desired.
When the parser/converter fails, I want to present a human readable error message that helpfully highlights what went wrong.
In the case of the CSV example, a rustc-like error message would be superb:
error: Expected `f64`
--> csv.csv:4:4
|
4 | 13,-.42.-
^^^^^^ help: invalid float literal
It does not appear to be possible to get this level of detail out of ConversionError. The error type does not carry a span that could be used to highlight the erroneous tokens. As far as I can tell, it is not possible to get this level of detail from the FromPest trait. #26 is related.
The derive macro adds another dimension of trouble for error handling: It only uses ConversionError<Void>, eliminating any possibility of returning the FatalError variant. Even worse, the inner(with(...)) and outer(with(...)) attributes cannot call a function that returns Result<T, ConversionError<Void>> for a field with type T.
The only possibility is unwrapping, as the CSV example does. I hopelessly tried with(try!) desperately and naively thinking I could sneak in a propagating unwrap by injecting the function-like macro into the token stream. There is no option to inject the ? operator, though this would be ideal.
My actual use case for error handling was parsing decimal integers. As a concrete example, the following grammar cannot be parsed into u8:
integer = @{ ASCII_DIGIT{1, 3} }
The grammar can constrain the input to a maximum of 3 digits, but that allows numbers out of range such as 999. The only thing for the parser to do is panic!
That leaves me with only one possible solution: Change the type on the AST field and process parsing errors in a second phase. (Maybe a Span<'pest> field can be added in addition to Result<T, ConversionError<Void>> for contextual errors? I didn't try it.) That means that I would end up with both a "half-parsed" AST type, and "parsed and validated" AST type. Which clearly goes against the "parse, don't validate" mantra. But worse than that, it duplicates a lot of code (mostly structural).
3. A few non-critical issues I ran into.
The derive impl for enums does not do what I expect (unit variants are not really handled, especially for terminals; I think #37 hit this case as well). I had to hand-write FromPest impls for all of my enums.
More surprisingly, I could not get the derived impl to match unary expressions at all with the following grammar:
unary = { unary_op* ~ term }
Parsing into type:
#[derive(Debug)]
pub struct Unary<'pest> {
pub ops: Vec<UnaryOp>,
pub term: Term<'pest>,
}The FromPest impl for Vec<T> is greedy. It consumes until it hits the term token and returns Err(NoMatch). I had to write a non-greedy FromPest impl to handle this grammar. #34 looks related.
I know this project is sort of in light maintenance mode, now. Given my experience so far, I cannot see it being realistically usable in serious parsers.