Skip to content

XML parser for !DOCTYPE does not handle spaces in PubidLiteral #12109

@halfaya

Description

@halfaya

Prerequisites

Description

The functionLean.Data.Xml.Parser.PubidChar does not consume the space character despite it being clearly stated as an allowed character in the referenced https://www.w3.org/TR/xml/#NT-PubidChar :
[13] | PubidChar | ::= | #x20 \| #xD \| #xA \| [a-zA-Z0-9] \| [-'()+,./:=?;!*#@$_%]

Context

This makes it impossible to parse MusicXml (https://github.com/w3c/musicxml), since a typical !DOCTYPE looks like this:
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 4.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">

Steps to Reproduce

import Lean.Data.Xml
#eval Lean.Xml.parse "<!DOCTYPE a PUBLIC \"b c\" \"d\"><a></a>"

Expected behavior: ok: <a></a>

Actual behavior: error: offset 21: expected: '"'

Versions

4.27.0-rc1
MacOS 15.7.3

Additional Information

Impact

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions