Skip to content

Formal grammar

This is the canonical EBNF specification of Carve. It defines block-level constructs first (frontmatter, headings, lists, tables, fenced code, captions), then inline constructs (emphasis, links, attributes, extensions, mentions, tags, CriticMarkup).

Implementations should match this grammar. The case study explains the design rationale, the reference page covers parsing edge cases, and the examples show the expected HTML output for each construct.

The full grammar lives at resources/grammar.ebnf in the repository.

ebnf
(* ============================================================================
   Carve Markup Language - EBNF Grammar
   Version: 0.1-draft

   This grammar defines the syntax of Carve, a post-Markdown lightweight markup
   language with visual mnemonics and human-centered design.

   Notation:
   - (*...*) = comment
   - 'x'     = terminal character
   - "x"     = terminal string
   - x | y   = alternation
   - x y     = concatenation
   - [x]     = optional (0 or 1)
   - {x}     = repetition (0 or more)
   - x+      = repetition (1 or more)
   - {x}+    = repetition (1 or more) -- same as x+, used where the repeated
               unit is a group
   - x - y   = exception (x but not y)
   - (x)     = grouping

   NORMATIVITY
   - This file (resources/grammar.ebnf) is the NORMATIVE specification of
     Carve. Where any document disagrees with it, this file wins. Within
     it, PART 9 (semantic constraints not expressible in pure EBNF) is the
     authority for behavior the grammar productions cannot encode.
   - docs/case-study/syntax.md and docs/edge-cases.md are EXPLANATORY and
     DERIVED — prose for humans, non-normative.
   - docs/examples.md and the generated tests/corpus/* pairs are the
     CONFORMANCE CONTRACT: every example is checked against the reference
     implementation in CI. A spec change is not real until a corpus pair
     pins it.
   ============================================================================ *)

(* ============================================================================
   PART 1: DOCUMENT STRUCTURE
   ============================================================================ *)

document = [frontmatter], {block}, EOF ;

frontmatter = frontmatter_open, frontmatter_content, frontmatter_close, newline ;
(* The opening delimiter may carry a format token (yaml | json | toml | neon |
   ...); a bare `---` defaults to `yaml`. The space before the token is OPTIONAL
   (lenient: both `---yaml` and `--- yaml` are accepted; `---yaml` is canonical).
   The closing delimiter is always a bare `---`. The token distinguishes a typed
   opener from a thematic break. A BARE `---` at document start is frontmatter
   only when a closing `---` line exists ahead (closer lookahead, same model as
   PART 9 §10); with no closer the line is an ordinary thematic break and the
   following lines are ordinary blocks. *)
frontmatter_open = "---", [space], [frontmatter_format], newline ;
frontmatter_close = "---", newline ;
frontmatter_format = (letter | digit)+ ;
frontmatter_content = (* metadata in the named format, parsed by that parser *) ;

block = heading
      | thematic_break
      | code_block
      | blockquote
      | list
      | table
      | line_block
      | admonition
      | div
      | comment_block
      | comment_line
      | raw_block
      | reference_definition
      | footnote_definition
      | abbreviation_definition
      | paragraph
      | block_attributes
      | blank_line ;
(* `block_attributes` is a standalone block-level line that renders
   nothing on its own; it attaches to a following block. Its precise
   reach (floating across blank lines), accumulation, and drop-if-dangling
   semantics are NOT context-free -- they are pinned in PART 9 §15.
   `comment_line`, `reference_definition`, `footnote_definition` and
   `abbreviation_definition` are the INVISIBLE blocks: they are parsed and
   consumed (collected into the definition table / dropped) and emit no
   output; like `block_attributes` they also interrupt an open paragraph
   (PART 9 §10, INVISIBLE CONSTRUCTS). *)

blank_line = {whitespace}, newline ;

(* ============================================================================
   PART 2: BLOCK ELEMENTS
   ============================================================================ *)

(* --- Headings --- *)
(* ATX (`#` prefix) only. Setext (underline) headings are intentionally
   NOT supported, matching djot: a `---` underline collides with the
   thematic break and frontmatter delimiter, reintroducing the ambiguity
   djot removed and breaking Design Principle 1 ("one syntax, one
   meaning"). *)
heading = atx_heading ;

atx_heading = heading_first_line, {heading_continuation_line} ;
heading_first_line = heading_marker, space, inline_content, newline ;
heading_marker = '#' | "##" | "###" | "####" | "#####" | "######" ;

(* MULTI-LINE HEADINGS -- NORMATIVE (like Djot; consistent with §10 paragraph
   interruption). A heading's text spills onto following lines until a blank
   line. A continuation line:
     - may carry SAME-or-LOWER number of `#` markers (stripped) or none,
       and its text is appended to the heading;
     - a heading marker with MORE `#` than the open heading starts a NEW
       heading (ends this one);
     - a caption (`^ ` ...) or a fenced comment (`%%%`) ends the heading;
     - a blank line ends it;
     - a block-opener (list, quote, table, fenced code, `:::` div, thematic
       break) ends the heading and starts that block, exactly as it interrupts
       a paragraph (§10).
   Only PLAIN text folds into the heading; an ordered marker still folds (it
   never interrupts a paragraph either, §10/§11). The heading id is derived from
   the full folded text.
   NO TRAILING ATTRIBUTES (djot-strict). A heading line carries NO trailing
   `{...}` attribute block: a `{...}` at the end of a heading line is ordinary
   inline content (literal text unless it forms an inline construct), and the
   id derives from the full literal text. Attributes attach via a PRECEDING
   block-attribute line (PART 9 §15), the uniform block rule; an explicit
   `#id` from that line lives on the `<section>` wrapper (PART 9 §13).
   Pinned by corpus 02-headings (the preceding-line and literal-trailing
   pairs). *)
heading_continuation_line = [heading_marker, space], inline_content, newline ;
   (* the optional marker must be same-or-lower than the open heading's --
      see the NORMATIVE note above *)

(* HEADING IDENTIFIERS -- NORMATIVE. A heading without an explicit {#id} gets an
   automatic identifier from its folded plain text (symbols `:name:` and footnote
   references excluded), GitHub/SSG-style:
     - replace each maximal run of non-alphanumeric ASCII characters with a single
       `-` (covers spaces, punctuation, `_`, and runs of `-`) -- the jgm/djot#393 rule;
     - trim leading/trailing `-`;
     - LOWERCASE (Unicode-aware): NON-ASCII characters are PRESERVED, only their case
       is folded (`Café` -> `café`, `日本語` unchanged). Lowercasing makes ids and the
       common cross-reference case-insensitive, matching author expectation;
     - a leading-digit result is prefixed `s-` (a bare leading digit is a valid HTML
       id but an invalid CSS selector); an empty result becomes a generated `s-N`;
     - duplicates within a document get a numeric `-2`, `-3`, ... suffix.
   An explicit {#id} is used verbatim (case preserved). Implementations MAY offer an
   OPT-IN mode that ASCII-folds the identifier for URL/CSS-fragment portability
   (carve-js `asciiHeadingIds` parse option; carve-php `AsciiHeadingIdsExtension`);
   ASCII folding is never the default. NOTE: carve lowercases by design, deliberately
   diverging from djot.js / djot-php which preserve case per #393. *)

(* --- Thematic Breaks --- *)
thematic_break = (('-', '-', '-', {'-'}) | ('*', '*', '*', {'*'}) | ('_', '_', '_', {'_'})), newline ;

(* --- Code Blocks --- *)
code_block = fenced_code_block, [caption] ;
(* A trailing caption (PART 9 §4) wraps the code block in a figure: a
   numbered LISTING. The caption may carry a `#` number placeholder, and a
   `</#id>` to the block resolves to "Listing N" (PART 9 §19). *)

fenced_code_block = code_fence_open, [space], [code_fence_info], newline,
                    code_content,
                    code_fence_close, newline ;
(* The space between the fence and the info string is OPTIONAL (lenient: both
   ```php and ``` php are accepted; Markdown writes no space, Djot writes the
   space). The no-space form (```php) is canonical and is what the X->Carve
   converters emit. *)
(* CODE-FENCE INFO STRING -- NORMATIVE. The info string is a single
   language_info token, optionally followed by a bracketed [label], or a bare
   [label] (no language). The label is structured metadata: it is NOT part of
   the language/class; the core renderer ignores it, an extension (e.g.
   code-group) may use it. A code fence carries NO inline attributes -- use the
   PRECEDING block-attribute line for those (PART 9 §15):
       {.fancy #x}
       ```php
       ...
       ```
   which renders on the <pre> (language stays `language-…` on the <code>).
   INVALID-FENCE FALLBACK: when the text after the language token is anything
   other than a [label] -- a bare second word, a quoted value, an inline `{...}`
   block (```js title="x", ``` php {.x}) -- the line is NOT a fenced code block;
   it falls back to ordinary inline parsing (the backtick run typically opens an
   inline code span). The bracket is the only delimiter that admits structured
   metadata. The raw passthrough opener (```=FORMAT, PART 9 §20) is a SEPARATE
   production matched before this one (a leading `=` is never a language). *)

code_fence_open = backtick_fence | tilde_fence ;
code_fence_close = backtick_fence | tilde_fence ;  (* same fence char; length >= opener (PART 9 §2) *)

backtick_fence = '`', '`', '`', {'`'} ;
tilde_fence = '~', '~', '~', {'~'} ;

code_fence_info = (language_info, [space+, label]) | label ;
language_info = (letter | digit | '-' | '_' | '+' | '#' | '.' | '/')+ ;  (* a single token; the punctuation covers real language tags like c++, c#, f#, asp.net, text/html. May start with a digit. The `/` is normative across all impls (so `text/html`, `image/svg` are language tokens, not split). A token MUST NOT start with `=`: a leading `=` is the raw_block opener (above), not a language. *)
label = '[', { any_char - ']' }, ']' ;  (* structured metadata, e.g. [Installation]; not part of the class *)
code_content = (* any text until matching fence, preserved literally *) ;
(* TABS IN CODE -- NORMATIVE. Literal tab characters in code content (fenced
   code blocks and inline code spans) are PRESERVED verbatim; a tab and N spaces
   are not interchangeable. Tab display width is a presentation concern (CSS
   `tab-size`). Tab-to-space expansion is NOT a default behavior -- it is opt-in
   via a tab-normalize extension (flat replacement, default 2 spaces, content
   only). Matches djot / CommonMark. Pinned by corpus tabs-in-code-preserved. *)

(* --- Blockquotes --- *)
blockquote = blockquote_line, {blockquote_line | lazy_continuation_line}, [caption] ;
(* marked and lazy lines may interleave: `> a` / lazy / `> b` is ONE quote *)
blockquote_line = '>', [' '], inline_content, newline ;

(* LAZY CONTINUATION -- NORMATIVE (CommonMark-compatible). After one or more
   blockquote_lines, a line that does NOT carry the '>' marker still continues
   the blockquote -- exactly as if it carried '>' -- provided it is:
     - not blank (a blank line ends the blockquote), and
     - not a block-opener that would interrupt a paragraph (§10): a heading,
       list, table, fenced code, `:::` div, thematic break, OR an "invisible"
       reference / footnote / abbreviation definition or comment -- each ends
       the blockquote and starts that block OUTSIDE it, and
     - not a caption ('^ ' ...), which attaches to the blockquote instead.
   Only PLAIN text (continuing an open paragraph) folds in; an ordered marker
   folds too (it never interrupts, §10/§11). The lazy line's text is appended to
   the blockquote's inner content before that content is block-parsed, so a
   hard-wrapped quoted paragraph need not repeat '>' on every line. *)
lazy_continuation_line = inline_content, newline ;  (* see the NORMATIVE note above for which lines qualify *)

(* --- Lists --- *)
list = unordered_list | ordered_list | definition_list ;

unordered_list = unordered_item+ ;
unordered_item = bullet_marker, [item_attributes], space, [task_marker], list_item_content ;
(* the optional task_marker makes the item a TASK item (checkbox); the
   plain-vs-task axis is a §11 same-list criterion *)
bullet_marker = '-' | '*' ;  (* `+` is NOT a bullet in Carve -- it is the list
                                continuation marker (PART 9 §17); a `+ ` line is
                                ordinary paragraph text. Deviates from djot. *)
(* MARKER REQUIRES CONTENT -- NORMATIVE. A bullet or ordered marker is a list
   item only when followed by a space AND non-empty content. A content-less
   marker line -- bare (`-`) or with trailing whitespace only (`- `, `-   `) --
   is NOT a list: it is paragraph text. The rule ignores trailing whitespace, so
   `-` and `- ` behave identically (an editor stripping the trailing space cannot
   change the meaning). Carve is stricter than CommonMark, which treats a bare
   `-` as an empty item; requiring content keeps a lone dash (a prose dash /
   placeholder) from silently becoming a list. Pinned by corpus
   content-less-marker-is-not-a-list. *)

ordered_list = ordered_item+ ;
ordered_item = ordered_marker, [item_attributes], space, list_item_content ;
(* The first item fixes the dialect (decimal / alpha / roman) and the
   delimiter (`.` or `)`); classification and the ambiguous-letter
   tie-break are in PART 9 §11. An ordered list does NOT interrupt a paragraph
   in any dialect/value (`1.`, `2.`, `1985.`, `a.`, `i.`); it needs a blank
   line before it (matching Djot, avoiding the CommonMark `1.`-only heuristic)
   -- the paragraph-interruption rule is PART 9 §10.
   The two delimiters are the trailing `.` and `)` only. djot's
   parenthesized `(1)` / `(a)` form is INTENTIONALLY NOT a marker -- it is
   too easily confused with a prose parenthetical -- so `(1) text` stays
   literal paragraph text (pinned by corpus parenthesized-ordered-marker). *)
ordered_marker = (digit+ | letter | roman_numeral), ('.' | ')') ;
roman_numeral = ('i' | 'v' | 'x' | 'l' | 'c' | 'd' | 'm')+
              | ('I' | 'V' | 'X' | 'L' | 'C' | 'D' | 'M')+ ;

(* LIST-ITEM ATTRIBUTES (Carve addition; NORMATIVE -- extends PART 9 §15). An
   attribute block ABUTTING the marker (no space between marker and `{`)
   attaches its attributes to the `<li>` itself; the marker's required space
   follows the block:
     `-{.c} text`     -> `<li class="c">text</li>`
     `3.{#x k=v} text`-> `<li id="x" k="v">text</li>`
   For task items the block abuts the marker, before the task marker:
     `-{.c} [ ] text` -> `<li class="c"><input ...> text</li>`.
   WHITESPACE IS THE DISCRIMINATOR -- NORMATIVE:
   - `-{.c} text` (attr ABUTS marker) -> the `{.c}` is part of the MARKER and
     attributes the `<li>`. This is the ONLY way to attribute the `<li>`.
   - `- {.c} text` (a space BEFORE `{`) -> the `{.c}` is ordinary item CONTENT,
     NOT a li-attribute. It then follows the normal inline/block rules: a
     same-line `{.c}` is leading inline content; a `{.c}` alone on its own line
     (inside the item) is a block-attribute line that floats to the next block
     within the item -- e.g. a `{.blue}` line preceding an indented `> quote`
     puts `.blue` on the `<blockquote>` (§15), NOT on the `<li>`. This keeps
     li-attributes and block-attributes cleanly separated.
   - DISAMBIGUATION of the abutting block mirrors the inline-span rule (§14):
     it is consumed as li-attributes only if it yields >= 1 attribute (`#id`,
     `.class`, `key=value`) OR is the blessed empty block (`-{} text` -> bare
     `<li>`). Otherwise (`-{+a+}`, `-{not attrs}`) the `-{` is not a marker and
     the line stays ordinary paragraph/inline text.
   - Free slot: `-{` and `1.{` / `1){` are NOT list markers today (bullet and
     ordered both require the space), so no existing input changes meaning.
   - Diverges from djot (which has no li-attribute form); deliberate Carve
     extension. The lazy-continuation accident -- a trailing `{…}` line folded
     onto a tight item, which carve-php attached to the `<li>` and carve-js
     dropped -- is REJECTED as the mechanism: it was position-dependent (broke
     once a sub-list intervened) and is not normative. *)
item_attributes = attributes ;

list_item_content = first_block_content
                  | (inline_content, newline, {list_continuation | nested_list | continuation_marker_block}) ;
list_continuation = indent, inline_content, newline ;
nested_list = indent, list ;
indent = whitespace, {whitespace} ;
(* Indentation is measured in VISUAL COLUMNS (a tab advances to the next
   multiple of 4 -- CommonMark tab stops); how much indent nests what is NOT
   a fixed character count -- the content-column rules are PART 9 §24. *)
(* List continuation marker (Carve addition; see PART 9 §17): a lone `+` at
   the item's marker column attaches the following flush-left block to the
   item with no blank line, keeping the list tight. *)
continuation_marker = '+', newline ;
continuation_marker_block = continuation_marker, block ;
(* First-block item (Carve addition; see PART 9 §17): a lone `+` as the sole
   content right after the marker (`- +`) opens an item whose body is the
   flush-left block(s) that follow, with no inline lead and no indentation.
   `- + text` keeps `+ text` as literal inline content -- only a bare `+`
   triggers this. *)
first_block_content = continuation_marker, {block} ;

(* Task lists are unordered lists with special markers *)
task_marker = '[', task_state, ']', space ;
(* `x`/`X` render a CHECKED checkbox; every other state (` `, `-`, `_`,
   `>`, `?`) renders an UNCHECKED checkbox. Matches djot-php. *)
task_state = ' ' | 'x' | 'X' | '-' | '_' | '>' | '?' ;

(* --- Definition Lists --- *)
definition_list = definition_entry+ ;
definition_entry = definition_term+, definition_body+ ;
definition_term = "::", space, inline_content, newline ;
definition_body = ':', space, space, inline_content, newline, {definition_continuation} ;
definition_continuation = (space, space, space, inline_content, newline)
                        | (backslash, newline, space, space, space, inline_content, newline) ;
(* The term marker is TWO colons. A single-colon `: term` line is NOT a
   definition list -- it is ordinary paragraph text. (Deliberate: djot uses a
   single colon, but `::` keeps the term marker distinct from the `:::` div
   fence and from a stray `:` in prose; all three reference impls agree.) *)

(* --- Tables --- *)
table = standard_row, {table_row}, [caption] ;
(* a table BEGINS with a standard row; a continuation row only ever follows
   an existing row (its cells append to the row above -- PART 9 §5) *)

table_row = standard_row | continuation_row ;

standard_row = '|', table_cell, {'|', table_cell}, '|', newline ;

continuation_row = '+', table_cell, {'|', table_cell}, '|', newline ;

table_cell = header_cell | data_cell | span_cell ;

header_cell = '=', [alignment_marker], {whitespace}, cell_content, {whitespace} ;
data_cell = [cell_attributes], [alignment_marker], {whitespace}, cell_content, {whitespace} ;
(* alignment_marker is glued to the opening '|' (no preceding whitespace),
   mirroring header_cell; a whitespace-delimited lone '^'/'<' is span_cell *)
(* CELL ATTRIBUTES (NORMATIVE): an attribute block GLUED to the opening '|'
   (no preceding whitespace) sets the cell's attributes; the rest of the cell,
   after optional whitespace, is the content. A space before the brace
   (`| {.x}`) is ordinary content, NOT attributes. The whole brace payload must
   be valid attribute syntax (§15); otherwise the '{' is literal content. A
   cell carrying attributes is never a bare span_cell -- its content is literal
   even if it is just '^' or '<'. A computed rowspan/colspan/alignment is
   authoritative, so an author copy of those keys is dropped. (corpus
   97-table-cell-attributes.) *)
cell_attributes = attributes ;
span_cell = rowspan_marker | colspan_marker ;

rowspan_marker = {whitespace}, '^', {whitespace} ;
colspan_marker = {whitespace}, '<', {whitespace} ;

alignment_marker = '<' | '>' | '~' ;  (* left, right, center *)

cell_content = inline_content ;
(* SEMANTIC CONSTRAINT (PART 9 §5): cell content may not contain an
   unescaped `|` outside a code span -- the pipe is the cell separator;
   the exception is not expressible context-free. *)

(* --- Captions --- *)
caption = '^', space, inline_content, newline ;

(* CAPTION NUMBER PLACEHOLDER -- NORMATIVE.
   The FIRST bare `#` in a caption's TOP-LEVEL inline_content is a number
   placeholder. "Bare" = a `#` that does NOT begin a tag (a `#`
   followed by whitespace, `:`, `.`, end-of-caption, or any
   non-tag-name character). `#word` stays a tag (PART 9 §19). `\#` is
   a literal `#`, never a placeholder. Only the first bare `#` is a
   placeholder; later bare `#` render literally. A `#` INSIDE inline
   markup (emphasis, link, span) is NOT a placeholder -- put it in the
   caption's top-level text (`^ *Figure* #:`, not `^ *Figure #*:`).
   The LABEL is the inline_content before the placeholder, trailing
   whitespace trimmed; the counter BUCKET KEY is that label's plain
   text. Numbering is per-bucket, 1-based, in document order, assigned
   in the resolution pass. A caption with no bare `#` is unchanged. *)

(* --- Admonitions --- *)
(* A fence is a run of 3+ colons. A longer opener nests shorter blocks; a
   block is closed only by a bare fence of equal-or-greater length, so a
   `:::` inside a `::::` block is content (PART 9 §12). *)
colon_fence = ":::", {":"} ;

admonition = admonition_open, newline,
             {block - admonition_close},
             admonition_close, newline ;
(* the body may be EMPTY, like the generic div's (PART 9 §12) *)

(* STRICT (djot): the opener line carries NO inline attributes -- it is
   colon_fence, type, and an optional quoted title, and NOTHING else. A
   trailing `{...}` (or any other non-title text) makes the line an
   ordinary paragraph, not a fence. Attributes attach via a PRECEDING
   block-attribute line, which floats onto the admonition (§15). *)
admonition_open = colon_fence, space, admonition_type, [space, quoted_title] ;
admonition_close = colon_fence ;

admonition_type = "note" | "tip" | "warning" | "danger" | "info"
                | "success" | "example" | "quote"  (* Tier 1 canonical *)
                | identifier ;  (* Tier 2 custom types (incl. `details`)
                                   -- see PART 9 §12 for the two-tier
                                   rendering rule *)

quoted_title = '"', {character - '"'}, '"' ;
(* there is NO escape mechanism inside a quoted title -- a title cannot
   contain a `"`; a line whose title is malformed is an ordinary paragraph
   (deliberate strictness, unlike `quoted_value` which accepts escapes) *)

(* --- Generic Divs (djot generic container; PART 9 §12) --- *)
(* A BARE `:::` opener with NO type word is a generic div, NOT an
   admonition (no class added). STRICT (djot): the opener carries NO inline
   attributes -- an inline `::: {…}` is a paragraph, not a div. Attributes
   attach via a PRECEDING block-attribute line, which floats onto the div
   (§15). Shares the colon-fence closer and the fence-length nesting rule
   (PART 9 §12). *)
div = div_open, newline, {block - admonition_close}, admonition_close, newline ;
div_open = colon_fence ;

(* --- Line block (verse) -- a RECOGNIZED `:::` type that preserves the
   author's per-line layout. `::: |` renders as a generic
   `<div class="line-block">` (NOT an `<aside>`), but unlike an ordinary div
   its body keeps each line's LEADING WHITESPACE and turns each soft line
   break into a hard break. The type token is a bare pipe `|` on the opener --
   NOT a per-line prefix -- so it is free of the pipe/table ambiguity of the
   Pandoc/djot per-line `|` form, and uses no English keyword (jgm/djot#29).
   The behavior keys off the `|` token. STRICT (djot): the opener carries no
   inline attributes, so the inline `::: {.line-block}` form is a PARAGRAPH, not
   a line block; extra attributes attach via a PRECEDING block-attribute line.
   Normative rendering: PART 9 §23. *)
line_block = line_block_open, newline, line_block_body, admonition_close, newline ;
line_block_open = colon_fence, space, "|" ;
(* The body is a sequence of stanzas separated by blank lines; each stanza is
   a run of consecutive non-blank lines. Inline content parses normally; the
   per-line leading whitespace is retained and the intra-stanza newlines are
   hard breaks (PART 9 §23). *)
line_block_body = stanza, {blank_line, stanza} ;
stanza = line_block_line, {line_block_line} ;
line_block_line = {whitespace}, inline_content, newline ;
(* the leading whitespace is PRESERVED in output -- PART 9 §23 *)

(* --- Comments --- *)
comment_line = "%%", {character}, newline ;

comment_block = comment_block_open, newline,
                {character | newline},
                comment_block_close, newline ;

comment_block_open = "%%%", {'%'} ;  (* 3+ percent signs *)
comment_block_close = "%%%", {'%'} ;  (* must match opener length *)

(* trailing (inline) line comment: from a %% marker to end of line, content
   not rendered. The marker requires whitespace or start-of-run before it and
   is not formed inside code spans / inline-raw, nor when the first % is
   escaped -- a semantic constraint stated in PART 9, not context-free. *)
inline_comment = "%%", {character - newline} ;

(* --- Raw Blocks --- *)
raw_block = code_fence_open, [space], "=", format_name, newline,
            raw_content,
            code_fence_close, newline ;
(* The opener is a code fence (backticks or tildes) whose info string is `=`
   immediately followed by a format name (```=html). The leading `=` is the
   block parallel of the inline raw `{=format}` attribute; it never starts a
   language token (a code fence's language charset excludes `=`), so this is
   unambiguous against an ordinary code block. The `=` and format name must be
   adjacent -- ```= html (space after `=`) is NOT a raw block. Leading
   whitespace before the `=` is permitted (```=html and ``` =html both open
   raw). The closer follows the PART 9 §2 rule: same fence character, length
   >= opener. Content is verbatim; it is emitted UNESCAPED when the format
   matches the output format (html) and DROPPED otherwise -- the block parallel
   of raw inline (PART 9 §20). This adopts djot's raw-block syntax; the earlier
   ```raw FORMAT keyword form was removed (no English keyword; symbol-based and
   symmetric with inline `{=format}`). *)

format_name = "html" | "latex" | identifier ;
raw_content = (* any content until closing fence *) ;

(* --- Paragraphs --- *)
(* A VISIBLE block (heading, list, quote, table, fence, thematic break,
   admonition/div) interrupts a paragraph with no blank line before it
   (Markdown-like); invisible constructs (reference definitions, comments,
   block-attribute lines) interrupt too. This is the semantic constraint
   PART 9 §10, not a context-free one. A paragraph is terminated by a blank
   line, an interrupting block (§10), or end of file -- the optional trailing
   blank_line here is the paragraph's own terminator, not a precondition for
   the NEXT block. *)
paragraph = inline_content+, [blank_line] ;


(* ============================================================================
   PART 3: INLINE ELEMENTS
   ============================================================================ *)

inline_content = {inline_element | text_run | literal_special}+ ;

(* A special_char that does not begin (or close) a construct under PART 8
   precedence + PART 9 conditions is literal text. Without this production
   the grammar cannot represent "a/b/c", "x = a*b", or an inner delimiter
   such as the middle '/' of /usr/local/ -- yet these are canonical outputs
   (edge-cases.md §1, corpus 01-emphasis). The "is this a construct?"
   decision is the semantic constraint in PART 9, not a context-free one. *)
literal_special = special_char ;

inline_element = escaped_char
               | raw_inline
               | code_span
               | autolink
               | auto_text_link
               | link
               | inline_span
               | image
               | math
               | emphasis
               | strong
               | bold_italic
               | underline
               | strikethrough
               | superscript
               | subscript
               | highlight
               | forced_emphasis
               | forced_strong
               | forced_underline
               | forced_strike
               | forced_super
               | forced_sub
               | forced_highlight
               | footnote
               | extension_inline
               | editorial_markup
               | mention
               | tag
               | smart_typography
               | inline_comment
               | hard_break
               | soft_break ;

text_run = (character - special_char)+ ;

special_char = '/' | '*' | '_' | '~' | '^' | '`' | '[' | ']'
             | '!' | '$' | '{' | '}' | '<' | '>' | '@' | '#'
             | ',' | '=' | ':' | '%' | '\'
             | '-' | '.' | '"' | "'" | '(' | '+' ;
(* the second group are the smart-typography trigger characters (dashes,
   ellipsis, quotes, `(c)`-family symbols, `+-`): they must be excluded
   from text_run so the smart_typography productions are reachable; when
   no pattern matches they are literal_special like any other special *)

(* --- Escaped Characters --- *)
escaped_char = '\', ascii_punctuation ;
ascii_punctuation = '!' | '"' | '#' | '$' | '%' | '&' | "'" | '(' | ')'
                  | '*' | '+' | ',' | '-' | '.' | '/' | ':' | ';'
                  | '<' | '=' | '>' | '?' | '@' | '[' | '\' | ']'
                  | '^' | '_' | '`' | '{' | '|' | '}' | '~' ;

(* --- Code Spans --- *)
code_span = backtick_run, code_span_content, [backtick_run], [attributes] ;
   (* the closing backtick_run is OPTIONAL: an unclosed run runs to end of
      block -- see UNCLOSED RUN below *)
backtick_run = '`'+ ;  (* the OPENER is a MAXIMAL run; the CLOSER is a run of
   the SAME count, also maximal (not part of a longer run) *)
code_span_content = (* any chars except a closing backtick_run of equal count *) ;
(* UNCLOSED RUN. An opener with no equal-length closer ahead is NOT literal
   text: it opens a verbatim span that runs to END_OF_BLOCK (the block's
   trailing whitespace is stripped; no surrounding single-space strip, which
   applies only to a closed span). Such an unclosed run is OPAQUE -- an
   emphasis delimiter or link tail after it is verbatim content, so the
   surrounding construct never closes. Matches djot upstream and carve-php.
   (Pinned as the "Inline verbatim ..." corpus pair; was a carve-js
   divergence, resolved.) *)
(* A trailing `{…}` on a code span is the generic inline-attribute
   block, NOT a language tag -- EXCEPT the exact form `{=format}`, which is
   raw inline passthrough (raw_inline below, PART 9 §20). For any OTHER
   `{…}`: both impls consume AND apply it -- `\`c\`{.x}` ->
   `<code class="x">c</code>`, and `#id`/`key=value` likewise. (Was an impl
   divergence where carve-js dropped the attributes; RESOLVED -- both now
   apply them identically.) *)

(* --- Raw Inline (inline parallel of raw_block; PART 9 §20) --- *)
(* A code span whose trailing attribute block is EXACTLY `{=format}` is raw
   inline passthrough: the verbatim span content is emitted unescaped when
   `format` matches the output, else dropped. Any OTHER trailing `{…}` is a
   generic attributed code span (above), NOT raw inline. *)
raw_inline = backtick_run, code_span_content, backtick_run, '{=', format_name, '}' ;

(* --- Links --- *)
link = inline_link | reference_link | collapsed_reference_link ;
(* autolink and auto_text_link (crossref) are listed directly in
   inline_element, not here. *)

inline_link = '[', link_text, ']', '(', link_destination, [link_title], ')', [attributes] ;
reference_link = '[', link_text, ']', '[', reference_label, ']', [attributes] ;
collapsed_reference_link = '[', link_text, ']', '[', ']', [attributes] ;

link_text = inline_content ;
(* SEMANTIC CONSTRAINT: the link text ends at the matching `]` -- the close
   is balanced-bracket, escape- and code-span-aware (PART 9 §16's rule for
   inline notes is the same scan); an unescaped bare `]` inside plain text
   closes it. Not expressible context-free. *)
link_destination = {(url_char - ')') | unicode_url_char}+ ;
(* The destination is ANY run of URL characters: absolute (`https://…`),
   relative (`/path`, `./file`), or a bare fragment (`#section`) -- there is
   no scheme requirement. It ends at the first whitespace (a following
   quoted run is the title) or at the first `)`; there is NO balanced-paren
   rule and NO escape inside the destination (`[x](http://a/b(c))` links to
   `http://a/b(c` and leaves the second `)` literal) -- percent-encode a
   literal `)` as `%29`. There is NO angle-bracket-wrapped destination
   form. *)
unicode_url_char = (* any non-whitespace, non-ASCII Unicode character *) ;
(* Titles accept double OR single quotes -- a deliberate enhancement
   over djot (which has no single-quote titles; it would fold `'...'`
   into the URL). The non-delimiting quote may appear inside the title. *)
link_title = space, ('"', {character - '"'}, '"')
           | space, ("'", {character - "'"}, "'") ;

reference_label = {character - ']'}+ ;
reference_definition = '[', reference_label, ']', ':', space, link_destination, [link_title], newline ;

(* --- Inline Spans --- *)
(* A bracketed inline run immediately followed by an attribute block
   attaches those attributes to a <span>. Distinguished from links by the
   character after ']': '(' -> inline_link, '[' -> reference link, '{' ->
   inline_span. A bare '[text]' with none of those is literal text (carve
   has no shortcut reference links). See PART 9 §14. *)
inline_span = '[', inline_content, ']', attributes ;

autolink = '<', (url_autolink | email_autolink), '>', [attributes] ;
(* A trailing `[attributes]` block attaches to the autolink (generic
   inline-attribute carrier, examples.md): `<https://example.com>{.ext}`
   -> `<a href="https://example.com" class="ext">https://example.com</a>`.
   Same slot as every other inline carrier; both reference impls agree. *)
url_autolink = scheme, ':', {url_char}+ ;
email_autolink = {email_char}+, '@', {email_char}+, '.', {letter}+ ;

auto_text_link = "</#", crossref_id, '>' ;  (* cross-reference with auto text *)
crossref_id = {character - ('>' | whitespace | newline)}+ ;
(* The crossref id token accepts NON-ASCII characters: automatic heading ids
   PRESERVE Unicode (`# Café Notes` -> `café-notes`), so `</#café-notes>`
   must be writable. Resolution is exact (case-sensitive against the id
   table); an ASCII-folded spelling of a Unicode id does NOT resolve
   (corpus 19-heading-ids). EXPLICIT ids (`{#id}`) remain ASCII
   `identifier`s -- the asymmetry is deliberate: authors control explicit
   ids, auto ids follow the heading text. *)

url = scheme, ':', {url_char}+ ;
scheme = letter, {letter | digit | '+' | '-' | '.'} ;  (* 1+ chars; single-letter schemes ok *)
url_char = letter | digit | '-' | '.' | '_' | '~' | ':' | '/' | '?'
         | '#' | '[' | ']' | '@' | '!' | '$' | '&' | "'" | '(' | ')'
         | '*' | '+' | ',' | ';' | '=' | '%' ;

(* --- Images --- *)
image = '!', '[', alt_text, ']', '(', image_source, [image_title], ')', [attributes] ;
alt_text = {character - ']'} ;
image_source = link_destination ;  (* absolute or relative, like links *)
image_title = link_title ;

(* A caption (`^ ` line) may follow a standalone image paragraph on the
   next line, wrapping it in a <figure> -- caption placement is PART 9 §4;
   there is no separate grammar production for the pair. *)

(* --- Math (djot form; PART 9 §18) --- *)
(* Inline `$` + a verbatim (backtick) span; display `$$` + a verbatim
   span. The backtick span removes ambiguity with a literal `$`, so
   currency such as `$5` (no following backtick run) stays literal text.
   Both forms are inline; display math renders inside its own paragraph
   when alone on a line. There is NO bare `$…$` form and NO `\(…\)` input
   form -- those were dropped (`\(` is just an escaped paren). *)
math = math_inline | math_display ;
math_inline = '$', code_span ;
math_display = "$$", code_span ;
(* TRAILING ATTRIBUTES on math -- NORMATIVE: math reuses `code_span`, which
   carries the generic `[attributes]` slot, so `$\`x\`{.c}` / `$$\`x\`{.c}`
   parse a trailing attribute block and APPLY it to the math span, merging
   classes into the existing `math inline` / `math display` class
   (`<span class="math inline c">`); `#id` / `key=value` are applied too.
   Canonical = carve-js / djot.js (pinned, corpus Math section). The
   `{=format}` raw form is code-span-ONLY and is NOT inherited by math:
   `$\`x\`{=html}` leaves the `{=html}` literal (both impls already agree).
   carve-php currently DROPS valid math attributes -- to be fixed.
   The math content is the verbatim text of the reused code_span. *)

(* --- Emphasis (left/right word-boundary rule; PART 9 §9 is normative).
   The word-boundary restriction applies to EVERY bare emphasis delimiter
   (`/`, `*`, `_`, `~`, `^`, `=`, `,`). No bare delimiter opens or closes
   intraword: `foo*bar*baz`, `foo~bar~baz`, `snake_case`, `a/b/c` all stay
   literal. One rule, no per-character carve-outs. Every bare delimiter is
   single-char -- there is no two-char delimiter. This is STRICTER than Djot,
   whose rule is whitespace-only.

   FORCED INTRAWORD -- the brace-pair `{X … X}` family (PART 9 §22) is the
   explicit escape hatch: it opens a span regardless of word boundary, so
   `x{*bold*}y`, `x{/italic/}y`, `x{_under_}y` emphasize intraword. It is
   ADDITIVE -- bare delimiters at a word boundary are unchanged; the braces are
   only needed to force a span where a bare delimiter would otherwise stay
   literal.

   HIGHLIGHT and SUBSCRIPT are the single-char `=` and `,` delimiters. Uniform
   word boundary makes single `=` and `,` safe: `x = 5`, `key=value`, `a=b`,
   `=>`, `1,2,3`, `a,b,c`, `x, y, z`, `$1,000` all stay literal -- the opener is
   followed by whitespace, or preceded/followed by an alphanumeric, so the
   word-boundary rule rejects it. A DOUBLED bare delimiter (`==x==`, `,,x,,`)
   is literal by the same-delimiter-adjacency rule (PART 9 §9), like `**x**` or
   `//x//`. The delimiter set is fully uniform: all single-char. --- *)

(* For EVERY bare delimiter (`/`, `*`, `_`, `~`, `^`, `=`, `,` -- all single-char):
   Opener: NOT followed by whitespace, AND preceded by start, whitespace, or
           punctuation -- but NOT by an alphanumeric, by `_`, or by the same
           delimiter character.
   Closer: NOT preceded by whitespace, AND NOT followed by an alphanumeric.
   Exact disambiguation of delimiter runs is pinned by the conformance corpus
   (tests/corpus/01-emphasis*); see PART 9 §9. Forced `{X … X}` spans bypass
   these conditions entirely (PART 9 §22). *)

emphasis = '/', emphasis_content, '/', [attributes] ;  (* italic *)
strong = '*', strong_content, '*', [attributes] ;       (* bold *)
bold_italic = "/*", bi_content, "*/", [attributes] ;    (* combined *)
underline = '_', underline_content, '_', [attributes] ;
strikethrough = '~', strike_content, '~', [attributes] ;
superscript = '^', super_content, '^', [attributes] ;
subscript = ',', sub_content, ',', [attributes] ;        (* single-char; was ",," *)
highlight = '=', highlight_content, '=', [attributes] ;  (* single-char; was "==" *)

(* --- Forced intraword emphasis (PART 9 §22). A brace-pair wrapping a bare
   emphasis delimiter forces a span with NO word-boundary condition, so it
   emphasizes intraword. The closing brace bounds the span; the inner delimiter
   characters are literal unless they open a nested forced span. Every mark has
   a forced form: *)
forced_emphasis   = "{/", forced_content, "/}", [attributes] ;   (* italic    *)
forced_strong     = "{*", forced_content, "*}", [attributes] ;   (* bold      *)
forced_underline  = "{_", forced_content, "_}", [attributes] ;   (* underline *)
forced_super      = "{^", forced_content, "^}", [attributes] ;   (* sup       *)
forced_sub        = "{,", forced_content, ",}", [attributes] ;   (* sub       *)
(* `{~ … ~}` is forced strikethrough UNLESS it contains a top-level `~>`, in
   which case it is editorial substitution (PART 9 §22 disambiguation). *)
forced_strike     = "{~", forced_content, "~}", [attributes] ;
(* `{= … =}` is forced highlight (see Editorial Markup below for the
   distinction from the raw-inline `{=format}` attribute). *)
forced_highlight  = "{=", forced_content, "=}", [attributes] ;
forced_content    = (inline_element | text_run | literal_special)+ ;

(* TRAILING ATTRIBUTES -- each emphasis-family span MAY carry a trailing
   `[attributes]` block: the generic inline-attribute carrier that attaches
   to the immediately-preceding inline node (the general rule, examples.md
   "Trailing attribute block edge cases"). `*x*{.real}` ->
   `<strong class="real">x</strong>`. It is the SAME `[attributes]` slot
   every other inline carrier uses (code_span, link, image, inline_span,
   extension_inline, autolink). Pinned by corpus
   75-trailing-attribute-block-edge-cases; both reference impls agree. *)

(* Content MAY contain the delimiter character: a delimiter that does not
   satisfy the close condition (PART 9 §1) is literal content, not the span
   boundary. The span ends at the matched closer only. Subtracting the
   delimiter here would make e.g. "/usr/local/" -> <em>usr/local</em>
   ungrammatical, contradicting edge-cases.md §1. The boundary is a semantic
   constraint (PART 9 §1, §9), not a context-free one. *)
emphasis_content = (inline_element | text_run | literal_special)+ ;
strong_content = (inline_element | text_run | literal_special)+ ;
bi_content = (inline_element | text_run | literal_special)+ ;
underline_content = (inline_element | text_run | literal_special)+ ;
strike_content = (inline_element | text_run | literal_special)+ ;
super_content = (inline_element | text_run | literal_special)+ ;
sub_content = (inline_element | text_run | literal_special)+ ;
highlight_content = (inline_element | text_run | literal_special)+ ;

(* Word boundary rule for emphasis openers/closers (normative: PART 9 §9).
   Applies to EVERY bare delimiter. Forced `{X … X}` spans (PART 9 §22) are
   exempt. PSEUDO-PRODUCTIONS -- prose conditions, not reachable grammar
   symbols. *)
emphasis_open_condition = (* preceding char is whitespace, start, or punctuation but NOT alphanumeric or '_'; following char is non-whitespace *) ;
emphasis_close_condition = (* preceding char is non-whitespace; following char is NOT alphanumeric *) ;

(* --- Footnotes (PART 9 §16) --- *)
(* Both implemented forms: the REFERENCE form `[^label]` resolving against a
   `[^label]: body` definition, and the INLINE form `^[content]` (pandoc
   form; a deliberate carve extension over djot). Numbering (one shared
   document-order sequence), endnotes rendering, and the inline form's
   balanced-bracket close are PART 9 §16. PART 8 ranks `^[` ABOVE
   superscript. Only the sidenote form remains deferred. *)
footnote = reference_footnote | inline_footnote ;
reference_footnote = "[^", footnote_label, ']' ;
footnote_label = {character - ']'}+ ;
inline_footnote = "^[", inline_content, ']' ;
(* the closing `]` is the balanced, escape- and code-span-aware close
   (PART 9 §16); footnote recognition is DISABLED inside the content *)

footnote_definition = "[^", footnote_label, "]:", space, inline_content, newline ;
(* the body extends to following lines indented by >= 2 spaces -- PART 9 §16 *)

(* DEFERRED (reserved, not implemented):
   sidenote = "[>", inline_content, ']' ; *)

(* --- Extensions --- *)
extension_inline = ':', extension_name, '[', extension_content, ']', [attributes] ;
extension_name = identifier ;
extension_content = {character - ']'} ;

(* --- Editorial Markup (CriticMarkup-inspired) --- *)
editorial_markup = addition | deletion | substitution | editorial_comment ;

addition = "{+", inline_content, "+}" ;
deletion = "{-", inline_content, "-}" ;
substitution = "{~", inline_content, "~>", inline_content, "~}" ;
editorial_comment = "{#", inline_content, "#}" ;
(* DISAMBIGUATION (PART 9 §22). `{~ … ~}` is editorial SUBSTITUTION when it
   contains a top-level `~>`, and forced STRIKETHROUGH otherwise. `{# … #}`
   stays editorial comment (`#` is not an emphasis delimiter, no collision). *)

(* HIGHLIGHT is the single-char `=` delimiter, and `{=text=}` is its FORCED
   intraword form (forced_highlight, PART 9 §22), rendering <mark>. It is
   distinct from the raw-inline `{=format}` attribute on a code span (e.g.
   `` `x`{=html} ``), which has no trailing `=` before the `}`. *)

(* --- Mentions and Tags --- *)
mention = '@', mention_name ;
mention_name = name_word, {'.', name_word} ;

tag = '#', tag_name ;
tag_name = name_word, {'.', name_word} ;

name_word = (letter | digit | '_' | '-')+ ;
(* Dots are INTERIOR-only by construction: a dot followed by another name
   character continues the name (`@john.doe`, `#release-1.0`); a dot at the
   end of the run is sentence punctuation, never part of the name
   (`ping @markus.` -> mention `markus` + literal `.`). Pinned by corpus
   89-mention-and-tag-name-boundaries / 31-mention-ignores-email-addresses. *)

(* Boundary rules (PSEUDO-PRODUCTIONS -- prose conditions on `mention` /
   `tag`, normative in PART 9 §7, not reachable grammar symbols):
   must be preceded by whitespace or start of content *)
mention_boundary = (* @name must have word boundary before @ *) ;
tag_boundary = (* #name must have word boundary before # *) ;

(* --- Smart Typography --- *)
smart_typography = em_dash | en_dash | ellipsis | smart_quote
                 | arrow | comparison | symbol ;
(* NOTE: fractions are deliberately NOT converted (PART 9 §8 / dismissed-
   syntax.md) -- `1/2` collides with dates (`1/2/2024`) and paths. *)

em_dash = "---" ;     (* → — *)
en_dash = "--" ;      (* → – *)
ellipsis = "..." ;    (* → … *)
(* HYPHEN RUNS -- a run of 2+ hyphens between non-hyphen characters
   decomposes djot-style: length divisible by 3 -> all em dashes;
   else divisible by 2 -> all en dashes; else one em dash (3) and the
   remainder by the same rule. So `--`=–, `---`=—, `----`=––,
   `-----`=—–, `------`=——, `-------`=—–– (PART 9 §8). *)

smart_quote = '"' | "'" ;
(* Smart quotes are PER-CHARACTER contextual substitution, NOT a paired
   span: each `"` / `'` is independently mapped to a left or right
   typographic quote by its FLANKING characters -- preceded by whitespace
   or start-of-content -> left (opening) quote; otherwise -> right
   (closing) quote. An apostrophe (`don't`) is therefore a right single
   quote; an unpaired quote still converts; the substitution never spans
   or pairs across other inline constructs. `\"` / `\'` escape to the
   straight character (PART 9 §8). *)

arrow = "->" | "<-" | "<->" | "=>" ;
comparison = "!=" | "<=" | ">=" ;
symbol = "(c)" | "(r)" | "(tm)" | "+-" ;

(* --- Breaks --- *)
hard_break = '\', newline ;  (* backslash at end of line *)
soft_break = newline ;        (* regular line ending within paragraph *)

(* ============================================================================
   PART 4: ATTRIBUTES
   ============================================================================ *)

attributes = '{', opt_ws, attribute_list, opt_ws, '}' ;
attribute_list = attribute, {whitespace+, attribute} ;
(* interior padding is accepted: `{ .a }` and `{.a  .b}` are valid inline
   attribute blocks, same whitespace handling as block_attributes (minus
   the line-continuation, which is block-level only) *)
(* RENDER ORDER -- NORMATIVE: attributes are emitted in the order they
   appear in the source, with all classes merged into a single `class`
   attribute placed at the position of the FIRST class. So `{.a #b k=c}`
   -> `class="a" id="b" k="c"` and `{k=c .a #b}` -> `k="c" class="a"
   id="b"`. Matches djot and carve-php. (Block-attribute merge keeps
   each slot at its first-appearance position; values are last-wins,
   PART 9 §15.) *)

attribute = id_attribute | class_attribute | key_value_attribute
          | boolean_attribute ;

id_attribute = '#', identifier ;
class_attribute = '.', identifier ;
key_value_attribute = identifier, '=', attribute_value ;
(* A bare identifier with no value is a BOOLEAN (value-less) attribute,
   rendered `name=""` (a carve extension beyond canonical djot, matching
   djot-php; matched after key_value_attribute so `k=v` is not read as a bare
   `k`). It mixes freely with the other forms. See PART 9 §14. *)
boolean_attribute = identifier ;

attribute_value = unquoted_value | quoted_value ;
unquoted_value = (letter | digit | '-' | '_')+ ;
(* A backslash escapes ASCII punctuation inside a quoted value (same
   `escaped_char` rule as inline text), so a value can contain a literal
   quote: `"a\"b"` -> the value `a"b`. *)
quoted_value = '"', { escaped_char | (character - '"' - '\') }, '"'
             | "'", { escaped_char | (character - "'" - '\') }, "'" ;

(* Block-level attributes appear on their own line(s) preceding a block.
   A single attribute block may span multiple physical lines (the `}`
   need not be on the opening line): a continuation is a single line
   break optionally followed by indentation. A BLANK line (two line
   breaks) ends the block -- it is not interior padding; an attribute
   block interrupted by a blank line is not a block_attributes (both
   reference impls treat it as literal text). Merge + reach semantics
   (including float-across-blank-lines BETWEEN separate blocks): §15. *)
block_attributes = '{', opt_ws, attribute, {attr_separator, attribute}, opt_ws, '}', newline ;
opt_ws = {whitespace} ;                  (* spaces/tabs only, no line breaks *)
attr_separator = (whitespace | continuation), opt_ws ;  (* one ws OR one line break *)
continuation = newline, opt_ws ;         (* a single line break + indent; NOT a blank line *)

(* ============================================================================
   PART 5: ABBREVIATIONS
   ============================================================================ *)

abbreviation_definition = "*[", abbreviation_term, "]:", space, abbreviation_expansion, newline ;
abbreviation_term = (letter | digit)+ ;
(* a term is a SINGLE alphanumeric word -- a bracketed term containing
   punctuation or spaces (`*[e.g.]:`, `*[HTTP API]:`) is NOT a definition;
   the line stays ordinary paragraph text *)
abbreviation_expansion = {character - newline}+ ;

(* ============================================================================
   PART 6: INCLUDES
   ============================================================================ *)

(* PROCESSOR-LEVEL (PART 9 §19): includes are NOT part of the core parser
   and the directive is deliberately NOT reachable from `block`/`inline` --
   a conformant core may leave `{{ … }}` literal. *)
include_directive = "{{", space, include_path, [include_section], [include_options], space, "}}" ;
include_path = {character - ('#' | '@' | '}' | ' ')}+ ;
include_section = '#', identifier ;
include_options = {space, '@', identifier, ':', attribute_value}+ ;

(* ============================================================================
   PART 7: LEXICAL ELEMENTS
   ============================================================================ *)

identifier = (letter | '_'), {letter | digit | '_' | '-'} ;

letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j'
       | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't'
       | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
       | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J'
       | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T'
       | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;

whitespace = ' ' | '\t' ;
space = ' ' ;
newline = '\n' | '\r\n' | '\r' ;
backslash = '\' ;

character = (* any Unicode character *) ;
email_char = letter | digit | '.' | '-' | '_' | '+' ;

EOF = (* end of file *) ;

(* ============================================================================
   PART 8: PARSING PRECEDENCE
   ============================================================================ *)

(*
   Block parsing precedence (first pass):
   1. Frontmatter (--- delimited at document start)
   2. Comments (%% line, %%% block)
   3. Raw blocks (```=FORMAT -- matched before ordinary fences),
      then code blocks (``` or ~~~ fenced)
   4. Headings (# prefix)
   5. Thematic breaks (---, ***, ___)
   6. Block quotes (> prefix)
   7. Definitions (link/footnote/abbreviation definition lines) and
      block-attribute lines ({...} alone on a line, PART 9 §15)
   8. Lists (-, * bullets at any indent; ordered markers; :: definition
      lists)
   9. Tables (| prefix)
   10. Colon fences (::: -- admonition with type word, line block with |,
       generic div when bare; longest-token wins: a `::: x` line is never
       a `:: ` definition term)
   11. Paragraphs (everything else)
   Recognition order is NOT interruption: which of these may interrupt an
   open paragraph is PART 9 §10 (ordered markers, for one, never do).

   Inline parsing precedence (second pass):
   1. Escaped characters (\x)
   2. Code spans (`...`)
   3. Autolinks (<url>) and crossrefs (</#id>)
   4. Inline footnotes (^[content]) -- above emphasis, so `^[` is a note,
      not superscript; see PART 9 §16
   5. Links, images, reference footnotes, inline spans
      ([text](url), ![alt](src), [^label], [text]{attrs} -- the
      single-char lookahead after `]` selects among them, PART 9 §14;
      a `[^` run is a footnote reference before bracket scanning)
   6. Math ($`…`, $$`…`)
   7. Emphasis markers (/, *, _, ~, ^, =, ,) and forced spans ({/.../} etc.)
      -- EXCEPT a delimiter that begins a multi-char smart-typography
      pattern: `=>` is the arrow, never a highlight opener (the pattern
      is consumed first; PART 9 §8)
   8. Editorial markup ({+...+}, etc.; {~...~} with `~>` = substitution)
   9. Extensions (:type[content])
   10. Mentions and tags (@user, #tag)
   11. Smart typography (--, ---, etc.)
   12. Trailing line comments (%%, PART 9 §21)
   13. Plain text

   Disambiguation rule:
   - Prefer literal text over markup: a delimiter that has no valid match
     (per the word-boundary conditions, PART 9 §1) is literal.
   - An opener matches the NEAREST following valid closer of the same type.
     Delimiters of the same type between them are literal content (same-type
     spans do not nest, PART 9 §3); e.g. /usr/local/ -> <em>usr/local</em>.
   - Different-type spans nest and are resolved with a delimiter stack in a
     single left-to-right pass (NO backtracking, linear time); e.g.
     *bold /italic/* and /italic *bold*/ nests fully.

   Note: "shorter spans / earlier opening wins" does NOT apply -- it would
   truncate /usr/local/ to <em>usr</em> and break nested emphasis. The
   delimiter-stack model is what makes Design Principle 1 (no backtracking)
   hold while still producing the canonical outputs in the corpus.
*)

(* ============================================================================
   PART 9: SEMANTIC CONSTRAINTS (NOT EXPRESSIBLE IN PURE EBNF)
   ============================================================================ *)

(*
   1. WORD BOUNDARIES FOR EMPHASIS (summary -- §9 below is NORMATIVE).
      The word-boundary conditions apply to EVERY bare delimiter
      (`/ * _ ~ ^ = ,` -- all single-char). No bare delimiter emphasizes
      intraword. The forced `{X … X}` family (§22) is the escape hatch for
      deliberate intraword emphasis.
      - SAME-DELIMITER ADJACENCY applies to ALL SEVEN single-char delimiters
        `/ * _ ~ ^ = ,`: a delimiter adjacent to another of the same delimiter
        (before or after) never opens, so a doubled delimiter is literal
        (`**x**`, `~~x~~`, `^^x^^`, `==x==`, `,,x,,` literal, like `//x//`,
        `__x__`).
      - Opener (all bare): NOT followed by whitespace AND preceded by
        whitespace, start, or punctuation but NOT by an alphanumeric or `_`
        (left word boundary)
      - Closer (all bare): NOT preceded by whitespace AND NOT followed by an
        alphanumeric (right word boundary).
      - Example: "/ not italic /" is literal (whitespace after opener);
        "a/b/c", "foo*bar*baz", "snake_case", "x = 5", "key=value" are literal
        (opener has no left boundary, or is followed by whitespace);
        "/usr/local/" -> <em>usr/local</em> (01-emphasis-6); "x{*y*}z" ->
        x<strong>y</strong>z (forced, §22).
      - This is STRICTER than Djot, whose rule is whitespace-only.

   2. MATCHING FENCES
      - Code fence closer must have same character and >= same length as opener
      - Comment block closer must have same length as opener

   3. NO SAME-TYPE NESTING
      - /nested /italic/ here/ is invalid
      - /nested *bold* here/ is valid
      - A consequence (§9 SAME-DELIMITER ADJACENCY): a doubled bare delimiter
        never opens nested same-type emphasis -- it is literal text. So `**x**`
        -> `**x**`, `~~x~~` -> `~~x~~`, `^^x^^` -> `^^x^^`, uniform with the
        already-literal `//x//` and `__x__` (corpus 71-doubled-emphasis-
        delimiters).

   4. CAPTION PLACEMENT
      - ^ caption must immediately follow image, blockquote, table, a fenced
        code block (a captioned code block is a numbered LISTING), or a
        standalone display-math block (a captioned equation is a numbered
        EQUATION; its block must be solely the `$$`…`` span)
      - One blank line allowed between block and caption

   5. TABLE CELL CONTENT, ROW VALIDITY, CONTINUATION ROWS
      - Pipes must be escaped (\|) or inside code spans. Escape processing
        is cell-splitting-aware: an unescaped `|` outside a code span
        separates cells; `\|` and a `|` inside a code span are content.
      - Span markers (^ <) must be sole cell content. A leading `=` on a
        cell (glued to the `|`) marks a HEADER cell; escape it (`\=`) for a
        data cell whose text starts with `=`. Because a span marker must be the
        WHOLE cell, any other content disqualifies it: a cell like `{.x} <` is
        NOT a colspan marker -- it is an ordinary cell whose literal text is
        `{.x} <`. (So a leading attribute block does not attach to a span
        marker; there is no such thing as an attributed span marker.)
      - SPAN MARKER WITH NOTHING TO MERGE (NORMATIVE): a rowspan `^` in the
        FIRST row, or a colspan `<` in the FIRST column, has no cell above /
        to its left to extend. Such a marker renders as an EMPTY cell (a
        `<td></td>` / `<th></th>` carrying no content and no span), rather than
        being dropped. (corpus 96-table-span-marker-in-first-column.)
      - VALID TABLE ROW (the §10 interruption test): a line whose first
        non-indent character is `|` AND whose last non-whitespace character
        is `|`, with at least one cell between them. A line-initial `|`
        without the trailing `|` does NOT interrupt a paragraph (it stays
        prose); at BLOCK START (after a blank line) the trailing `|` is
        lenient -- a `| a | b` line still opens a table.
      - CONTINUATION ROWS (`+` first column instead of `|`): a continuation
        row appends to the row ABOVE it -- each of its non-empty cells is
        joined onto the corresponding cell of that row (separated by a
        single space, as a soft wrap); empty cells append nothing. It adds
        no `<tr>`. Works under rowspan: the joined content belongs to the
        spanning cell (corpus 40, 41). A table cannot BEGIN with a
        continuation row (PART 2 `table`).

   6. REFERENCE RESOLUTION
      - Reference links [text][ref] resolved to [ref]: url definitions
      - LABEL MATCHING is EXACT: case-sensitive, no whitespace folding
        (corpus 73-reference-labels-are-case-sensitive). When the same
        label is defined twice, the LAST definition wins for links --
        note the deliberate asymmetry with footnote definitions, where
        the FIRST wins (§16).
      - Collapsed references [text][] use text as reference label
      - Abbreviations matched at word boundaries only
      - All definitions (link/footnote/abbreviation references, heading IDs)
        are collected in the first pass (PART 8, "first pass") BEFORE inline
        resolution runs. A document-order definition appearing AFTER its use
        is still resolved -- this is the defined two-pass model, not
        backtracking, and is O(n). Design Principle 2 ("no dependency on
        later references") scopes to inline *tokenization/highlighting*,
        which never needs the definition table; semantic *expansion*
        (abbr <abbr>, </#id> auto-text, [ref] targets) uses it.

   7. MENTION/TAG BOUNDARIES
      - @ and # only start mentions/tags at word boundaries
      - email@domain.com is NOT a mention
      - The name runs over letters, digits, `_`, `-`, and INTERIOR dots
        (a dot followed by another name character: `@john.doe`,
        `#release-1.0`); a trailing dot is sentence punctuation, not part
        of the name

   8. SMART TYPOGRAPHY CONTEXT
      - Only applies outside code spans/blocks
      - Patterns must match exactly (not partial)
      - FRACTIONS ARE NOT CONVERTED. `1/2`, `3/4`, etc. stay literal --
        they collide with dates (`1/2/2024`) and paths, and djot has no
        fractions (recorded in dismissed-syntax.md). Converted set:
        dashes, ellipsis, smart quotes, arrows, comparisons, `+-`, and
        symbols (`(c)`,`(r)`,`(tm)`).

   9. EMPHASIS SPAN BOUNDARY -- NORMATIVE (governs *_content productions in
      PART 3; §1 above is a summary of this section).
      - SAME-DELIMITER ADJACENCY (ALL SEVEN single-char delimiters `/ * _ ~ ^ = ,`).
        A bare delimiter immediately adjacent -- before OR after -- to another
        of the SAME delimiter never opens a span. A doubled delimiter is
        therefore literal text, never nested same-type emphasis (§3): `**x**`,
        `~~x~~`, `^^x^^`, `==x==`, `,,x,,` stay literal exactly like `//x//` and
        `__x__` (corpus 71-doubled-emphasis-delimiters, 01-emphasis-11). A
        longer run (`,,,y,,,`, `===y===`) is likewise all-literal (corpus
        74-two-char-delimiter-runs).
      - WORD BOUNDARY (ALL bare delimiters `/ * _ ~ ^ = ,`; no
        bare intraword emphasis). A delimiter opens only if NOT followed by
        whitespace AND preceded by whitespace, start, or punctuation -- but NOT
        by an alphanumeric or `_` (left word boundary). So (/x/) -> (<em>x</em>)
        and a./b/ -> a.<em>b</em> (corpus 01-emphasis-8), while foo_bar_baz,
        foo*bar*baz, snake_case, x = 5, key=value stay literal (01-emphasis-10,
        01-emphasis-11). For deliberate intraword emphasis use the forced
        `{X … X}` family (§22): x{*y*}z -> x<strong>y</strong>z.
      - A delimiter closes only if NOT preceded by whitespace AND NOT followed
        by an alphanumeric (right word boundary), for every bare delimiter.
        "/ not italic /" stays literal (ws after opener); "x /a/b y" stays
        literal because the candidate closer is followed by `b` (corpus
        01-emphasis-9).
      - The span runs from the opener to its NEAREST valid closer; any
        same-type delimiter in between is LITERAL content (no same-type
        nesting, §3). Hence /usr/local/ -> <em>usr/local</em> (01-emphasis-6)
        and the/path/here is literal (first / has no left boundary). The
        trailing delimiter of an unbalanced run stays literal, so /b// ->
        <em>b</em>/ and /x// -> <em>x</em>/. See edge-cases.md §1.
      - Resolution uses a delimiter stack, single left-to-right pass, no
        backtracking (Design Principle 1). Forced `{X … X}` spans (§22) are
        pushed on the same stack; only their word-boundary test is skipped.

   10. PARAGRAPH INTERRUPTION -- NORMATIVE (governs `paragraph` in PART 2).
      A VISIBLE block INTERRUPTS an open paragraph with NO blank line before
      it, at the document top level AND inside nested content (list item,
      block quote, admonition/div body). A continuation line that begins a
      block is parsed as that block; the paragraph ends on the preceding line.
      This is the Markdown-like rule (CommonMark "a paragraph can be
      interrupted") and the key block-level divergence from Djot, where an open
      paragraph runs until a blank line so a marker line stays paragraph text.
      The interrupting blocks:
        - heading            `#`..`######` + space;
        - thematic break     a lone `---` / `***` / `___` line;
        - block quote        `>` (NOTE: a line-initial `>` is ALWAYS a quote
                             marker, so a prose line starting `>= 5` becomes a
                             quote -- escape it, `\>= 5`, to keep it prose);
        - unordered / task   `- ` / `* ` (bullet + space; `- [x] ` task; `+` is
                             NOT a bullet -- it is the continuation marker, §17).
                             A bullet interrupts at ANY INDENTATION (Rule B,
                             §24): an indented `  - item` line interrupts
                             exactly like a column-0 one;
        - table row          a `|`...`|` line that is a valid table row
                             (a stray `|` in prose is NOT -- see VALID TABLE
                             ROW in §5);
        - fenced code        a ``` / ~~~ opener that HAS a matching closer
                             ahead (see CLOSER LOOKAHEAD);
        - admonition / div   a `:::` opener that HAS a matching closer ahead.
      So "Liste: \n - eins \n - zwei" is <p>Liste:</p> + a <ul>; "intro
      \n # H" is <p>intro</p> + <h1>; "x = 5 \n * 3 + 17" is <p>x = 5</p> +
      <ul><li>3 + 17</li></ul> (corpus 05-lists-12 and the
      76-paragraph-interruption family).
      That last case is the ACCEPTED COST of paragraph interruption: a
      hard-wrapped prose line that happens to begin with a bullet-plus-space
      (or `>` / `#` / a valid table row) starts a block. Insert a blank line, or
      backslash-escape the marker ("\* 3 + 17"), to keep such a line as prose.

      ORDERED LISTS DO NOT INTERRUPT. An ordered-list marker -- decimal,
      letter, or roman, in any value -- never interrupts a paragraph; it needs a
      blank line before it (matching Djot). This is deliberate: unlike a bullet
      (`- `/`* ` is unambiguous), an ordered marker is too common in prose
      ("see step 2.", "version 1985.", "upgrade to 1. today"), and the only way
      to allow it would be the CommonMark `1.`-only heuristic -- the exact wart
      Djot removed. Carve drops the heuristic by not interrupting on ordered
      markers at all (corpus 54-ordered-marker-vs-prose, Paragraph interruption).

      IMAGE EXCLUDED. A line that is only a bare image ("![alt](url)") does NOT
      interrupt: an image is not a block of its own in Carve, so it stays an
      inline image inside the paragraph.

      CLOSER LOOKAHEAD (fenced code, admonition, div). A fence / `:::` opener
      interrupts ONLY when a matching closer exists ahead in the same context.
      An UNTERMINATED opener does NOT interrupt -- the line stays paragraph
      text -- so a stray ``` or `:::` in prose never swallows the rest of the
      block. An unterminated ``` is then an unclosed inline verbatim run that
      renders as a `<code>` span to the end of the block (the `code_span`
      maximal-run rule). Pinned by corpus 76-paragraph-interruption (the
      unterminated-fence and unterminated-`:::` pairs).

      INVISIBLE CONSTRUCTS. A reference definition (link `[r]: url`, footnote
      `[^r]: …`, abbreviation `*[A]: …`), a comment (`%%` line, `%%%` block),
      or a block-attribute line (`{…}` alone on a line, §15) also interrupts
      a paragraph with no blank line: it is parsed and consumed/collected,
      rather than left as literal text. So "See[^m].\n[^m]:
      note" resolves the footnote, "para\n%% x" drops the comment, and
      "Para\n{.x}" ends the paragraph with the attribute line floating
      forward (§15; corpus 84-block-attribute-lines-7).
      (Unchanged from the prior rule; a small deviation from djot, which keeps
      such a line as paragraph text.)

      NESTING. Interruption applies inside nested content too: a `# H` after a
      prose line in a block quote ("> text \n > # H") is a heading, and an
      indented sublist still nests with no blank line ("- a \n   - b" -> a list
      whose first item holds a sublist). A col-0 marker with no `>` inside a
      block quote terminates the quote (the marker is a sibling block), per the
      block-quote lazy-continuation rule (see `blockquote`). HEADING
      CONTINUATION follows the same principle (see `heading`): a block-opener
      interrupts an open heading and starts that block, exactly as it interrupts
      a paragraph; a bare-prose or same/lower `#` continuation line folds into
      the heading text instead.

   11. LIST MARKER CHANGE STARTS A NEW LIST -- NORMATIVE (governs `list`
      in PART 2). §10 decides FIRST whether a sequence of marker lines
      establishes a list at all (paragraph-interruption gate); §11 then
      decides whether established list items belong to one list or to
      adjacent sibling lists. The rules layer; §11 never overrides §10.
      Given two adjacent list items at the same indent that both passed
      §10, they belong to the SAME list only when their marker matches:
        - same character within `-`, `*` for unordered (`+` is NOT a
          bullet in Carve -- it is the continuation marker, §17);
        - same `ordered_marker` alternative for ordered;
        - same plain-vs-task classification.
      A line whose marker differs from the immediately-preceding item
      on any of those axes starts a new sibling list at the same indent.
      Matches djot (minus djot's `+` bullet). Example:
        - a              produces:    <ul><li>a</li><li>b</li></ul>
        - b                            <ul><li>c</li><li>d</li></ul>
        * c
        * d
      Same axis for plain-vs-task: `- a` followed by `- [x] b` is two
      lists (matches djot). The marker-match rule itself looks only one
      item back; the ordered DIALECT below additionally carries the
      first item's classification (and its one-item-forward tie-break)
      for the rest of the list.
      ORDERED DIALECTS. An ordered list's dialect (decimal / lower- or
      upper-alpha / lower- or upper-roman) and delimiter (`.` or `)`) are
      fixed by its FIRST item; a later item whose marker is outside that
      dialect, or uses the other delimiter, starts a new sibling list. The
      first item also sets the `<ol>` `type` (a/A/i/I; decimal omits it)
      and `start` (the marker's value; omitted when 1).
      AMBIGUOUS-LETTER TIE-BREAK: a single roman-letter first marker
      (i/v/x/l/c/d/m) is ROMAN when the next sibling is the consecutive
      roman numeral (`iv.`+`v.`, `i.`+`ii.`) and ALPHA when the next is the
      consecutive letter (`c.`+`d.`, `v.`+`w.`); a lone `i`/`I` defaults to
      roman, any other lone letter to alpha.

   12. FENCED ":::" BLOCK RENDERING -- NORMATIVE (governs `admonition`
      and `div` in PART 2; cross-references syntax.md §4.20 "Block
      Extensions"). A TYPED `::: word` block renders by a TWO-TIER rule
      on the type identifier (below). A BARE `:::` opener with NO type
      word is a GENERIC DIV: a plain `<div>` (no class added; an empty div
      is `<div></div>`), i.e. djot's generic container.

      INLINE OPENER ATTRIBUTES -- STRICT (djot). The opener line carries NO
      inline attributes: it is the colon fence, an optional type word, and
      an optional quoted title, and NOTHING else. Any trailing `{...}` (or
      other non-title text after the type) makes the line an ORDINARY
      PARAGRAPH, not a fence -- so `::: note {.x}`, `::: {.x}`, and
      `:::{k=v}` are all paragraphs. To attribute a div or admonition, use
      a PRECEDING block-attribute line, which floats onto the block (§15):
      `{.x #id}` then `::: note` yields `<aside class="admonition note x"
      id="id">`. This matches canonical djot, which rejects any non-class
      text on the fence line.

      FENCE LENGTH & NESTING: a fence is a run of 3+ colons; the same rule
      governs both admonitions and generic divs. A block is closed only by
      a bare fence of EQUAL-OR-GREATER colon length, so a longer opener
      nests shorter blocks (a `:::` inside a `::::` block is content, not a
      closer). Equal-length fences do not nest. A bare opener with no
      matching closer ahead is literal text.

      The two tiers for a typed block:

      Tier 1 -- CANONICAL ADMONITION TYPES render as a semantic
        `<aside>` with the `admonition` marker class:
            <aside class="admonition {type}">
              [<p class="admonition-title">{quoted_title}</p>]
              {body}
            </aside>
        The canonical set is the eight call-out types
        `note`, `tip`, `warning`, `danger`, `info`, `success`,
        `example`, `quote`. Consuming CSS frameworks target these exact
        class names.

      Tier 2 -- ANY OTHER (custom) type renders as a generic block-level
        `<div>` carrying the verbatim type as its class:
            <div class="{type}">
              [<p class="admonition-title">{quoted_title}</p>]
              {body}
            </div>
        This is the carve fenced-div primitive that the block-extension
        mechanism (syntax.md §4.20) builds on -- e.g. `::: tabs`,
        `::: mermaid`, `::: codepen` produce `<div class="tabs">`,
        `<div class="mermaid">`, `<div class="codepen">` respectively,
        which a registered extension may post-process. An unregistered
        custom type still renders as its generic `<div class="{type}">`
        so the document stays readable. `details` is an ordinary Tier-2
        type (`<div class="details">`); an extension that wants the
        HTML5 `<details>/<summary>` disclosure element opts in via §4.20.

      Pinned rules common to both tiers:
        a. The class is the literal type identifier (Tier 1 prefixes it
           with `admonition ` separated by a single space:
           `admonition note`, NOT `admonition-note`; Tier 2 uses the
           bare `{type}`). The type is NEVER folded together with the
           quoted title -- the title is a child element, never a class.
        b. A `<p class="admonition-title">…</p>` line is emitted ONLY
           when the author supplied a `quoted_title` after the type.
           Carve does NOT invent a default title from the type name.
           An explicitly empty `""` still counts as a supplied (empty)
           title. Applies to both tiers.

   13. HEADING SECTION WRAPPING -- NORMATIVE (governs `atx_heading` in
      PART 2 and the renderer; matches djot, see `jgm/djot` official
      `headings.test`). Every heading emits a `<section id="{id}">`
      wrapper around itself and the following content up to the next
      same-or-shallower heading. The slug-derivation rule (ASCII
      transliteration etc., see syntax.md §4.1) computes
      the id; the id lives on the `<section>` element, NOT on the
      `<h*>` element. This shifts the existing carve behavior
      (id-on-h*, no section wrapper) to match djot's structural model.
      Algorithm (one stateful pass over top-level blocks):
        let openSections : Stack of (level, sectionElement) = []
        for each top-level child node:
          if node is Heading at level N:
            while openSections is non-empty AND top.level >= N:
              emit </section>; pop openSections
            emit <section id="{slug}">; push (N, ...)
            emit <h{N}>{inline-rendered children}</h{N}>
          else:
            emit node normally
        while openSections is non-empty:
          emit </section>; pop openSections
      Properties:
      - Adjacent same-level headings: produce sibling sections at the
        same level (`# A / # B` -> two <section>s).
      - Skipped levels: a `# H1 / ### H3` sequence nests H3's section
        inside H1's, because the §11-style stack-close test (top.level
        >= N) only closes sections at level >= 3 (none open at level 3
        or 5/6). The structure mirrors djot's behavior; carve does NOT
        synthesize intermediate <h2>/<section> nodes.
      - Explicit `{#id}` on a heading: the id still lives on the
        `<section>`, not on the `<h*>`. The dedup namespace (syntax.md
        §4.1 / heading-id-tracker rules) is unchanged -- explicit ids are
        reserved before auto-ids in document order.
      - Empty document or doc with no headings: zero <section> elements
        are emitted.
      - The two-pass id resolution (collect explicit ids first, then
        auto-slug headings with collision-dedup) is unaffected by this
        rule -- only the EMISSION site of the id moves from <h*> to
        <section>. Existing crossref (`</#id>`) and implicit-heading
        ref (`[Heading][]`) resolution continue to target the same
        slug; the fragment URL `#{slug}` resolves to <section id> the
        same way browsers resolve to <h* id>.
      - Carve does NOT add `<section>` wrappers around non-heading
        top-level blocks. Headings are the only trigger.

   14. INLINE SPAN VS LINK DISAMBIGUATION -- NORMATIVE (governs `link`
      and `inline_span` in PART 3). After scanning a bracketed run
      `[ inline_content ]`, the IMMEDIATELY FOLLOWING character selects
      the construct (single-character lookahead, no backtracking):
        - `(` -> inline_link  ([text](url))
        - `[` -> reference_link or collapsed_reference_link
                 ([text][ref] / [text][])
        - `{` -> inline_span  ([text]{attrs})
        - anything else (or end of input) -> the `[`...`]` is LITERAL
          text. Carve has no shortcut reference link: a bare `[label]`
          never resolves against a `[label]: url` definition.
      An inline_span renders as
            <span {attrs}>{inline-rendered content}</span>
      where {attrs} is the attribute block applied verbatim (id, classes,
      key=value), in the same form as any other attribute carrier
      (`#id` -> id, `.x` -> class, `k=v` -> key). The bracketed content
      is full inline content and is parsed recursively, so
      `[a /b/ c]{.x}` -> `<span class="x">a <em>b</em> c</span>`.
      Because the `{` test fires only when the attribute block directly
      abuts `]`, `[text] {.x}` (with a space) is literal `[text]`
      followed by a literal `{.x}` -- the span requires adjacency.
      EMPTY OR INVALID ATTRIBUTE BLOCK -- NORMATIVE. The directly-abutting
      `{...}` is classified by its content:
        - yields at least one attribute (`[text]{.c}`) -> span with those
          attributes.
        - empty or whitespace-only (`[text]{}`, `[text]{ }`) -> a valid EMPTY
          block: a bare `<span>text</span>`. Both reference impls agree, even
          though the bare `attribute_list` production nominally needs >= 1
          attribute; the empty block is a blessed exception so a
          default-attribute processor can target the span.
        - unrecognized non-empty content (`[text]{???}`, `[text]{=y=}`) -> NOT
          an attribute block: the `]` and the `{...}` render literally. The
          inner bracket content is still inline-parsed, so `[*x*]{???}` ->
          `[<strong>x</strong>]{???}`.
        - a name (id, class, or key) that is not a grammar `identifier` (line
          905) is also unrecognized: a DIGIT-FIRST name (`{.123}`, `{#1}`,
          `{2=v}`) or a name with a non-identifier character (`{.a!b}`).
          ONE invalid name makes the WHOLE block not an attribute block, even
          when mixed with valid attributes (`{.ok .1}`), so the run stays
          literal. (Deliberate strictness beyond djot, which accepts
          digit-first identifiers / `class="123"`; see jgm/djot issue 399.)
          A digit/`-`/`_` AFTER the first character is valid (`{.a1}`).
      BOOLEAN (value-less) ATTRIBUTES -- NORMATIVE. A bare identifier with no
      value (`[text]{kbd}`, `{.note open}`, `{disabled}` on a block line) is a
      boolean attribute, rendered `name=""`. It is matched after the
      `key_value_attribute` form (so `k=v` stays a key/value) and mixes freely
      with id/class/key=value in source order; multiple are allowed; a
      digit-first bare word is not a valid name (the block stays literal, as for
      any digit-first identifier). All impls support it (a carve extension
      beyond canonical djot, matching djot-php; corpus 95-boolean-attributes).
      The boundary of "yields an attribute" still differs at the margins:
      carve-php additionally accepts colon-bearing keys/classes
      (`{xml:lang="en"}`, `{.sm:hover}`) and comment-only blocks (`{% c %}`) as
      spans, where carve-js treats those as literal. Use a plain
      `.class` / `#id` / `key=value` / bare word for a portable span.

   15. BLOCK ATTRIBUTE LINES -- NORMATIVE (governs `block_attributes`
      in PART 2; matches djot). A `{...}` attribute block on its own
      line is a leading block attribute: it does NOT render -- it
      attaches to the next block element. Semantics:
      - REACH: leading attribute blocks float forward to the next block
        element, across intervening blank lines. `{#id}\n\nText` yields
        `<p id="id">Text</p>`. A run with no following block element
        (e.g. at end of document) is dropped, producing no output.
      - TRAILING A PARAGRAPH -- NORMATIVE. A `{...}` line directly after
        paragraph content (no blank line) is still a leading block-attribute
        line: it INTERRUPTS the paragraph (§10) and floats FORWARD like any
        other. It does NOT attach backward to the paragraph it follows, and
        does NOT fold into the paragraph as literal text. `Para\n{.class}`
        yields `<p>Para</p>` (dropped -- no following block); and
        `Para\n{.class}\n\nNext` yields `<p>Para</p>` then
        `<p class="class">Next</p>`. (A trailing SAME-LINE `{...}` with no
        abutting host is not a block-attribute line at all -- it stays literal
        inline content, §14: `Para {.x}` -> `<p>Para {.x}</p>`.)
      - ACCUMULATION across consecutive blocks: when several attribute
        blocks precede the same target (adjacent or blank-separated),
        they merge in source order with these rules --
          * id: last one wins.
          * key=value: last value for a given key wins.
          * class: ALL classes accumulate in source order, with NO
            de-duplication (`{.a .b}` then `{.b .c}` -> `class="a b b c"`,
            matching djot and carve-php).
        Worked example (the djot canonical case):
          {#id}
          {key=val}
          {.foo .bar}
          {key=val2}
          {.baz}
          {#id2}
          Okay
        -> <p id="id2" key="val2" class="foo bar baz">Okay</p>
      - MULTI-LINE BLOCK: a single attribute block may wrap across lines
        -- the `}` need not sit on the opening line. `{#id\n .foo}` is
        one block contributing id `id` and class `foo`. A BLANK line
        inside the braces ends the block (it is then literal text, not a
        block_attributes). A quoted value containing a literal `}` that
        spans lines is not supported by the reference impls
        (pathological).
      - NOT AN ATTRIBUTE LIST -> NOT AN ATTRIBUTE LINE: a `{…}` line whose
        braces do not parse as an attribute list is ordinary paragraph
        content and follows the normal inline rules (e.g. a `{#a #}` line
        is an editorial comment, §22 family, not a block-attribute line).
      - NO block construct takes a trailing `{…}` on its own line
        (djot-strict; headings, fences and `:::` openers included) --
        the preceding block-attribute line is the ONLY block-level
        attribute channel. A trailing `{…}` on a heading line is literal
        inline content (PART 2, headings).
      Implementation status: carve-php, carve-js, and carve-rs implement §15
      in full -- leading block-attribute lines, float-forward across blank
      lines (including a line that trails a paragraph), accumulation, and
      multi-line blocks. Pinned by corpus 84-block-attribute-lines.

   16. FOOTNOTES -- NORMATIVE (governs `footnote` / `footnote_definition`
      in PART 3). The reference form `[^label]` + `[^label]: body` and the
      INLINE form `^[content]` are implemented; the sidenote (`[>content]`)
      remains deferred.
      - A `[^label]` reference with a matching definition is NUMBERED by
        document reference order (first referenced = 1); the same label
        referenced again reuses its number.
      - Definitions may appear anywhere (order-independent); the FIRST
        definition for a label wins. A body is the def line plus any
        following lines indented by >= 2 spaces (single blank lines
        allowed between chunks), parsed as blocks.
      - A reference with NO matching definition renders as literal source
        text `[^label]` (a trailing `{attrs}` on an unresolved ref is not
        round-tripped -- a marginal mistyped-label edge). An UNREFERENCED
        definition is dropped.
      - Rendering (djot-compatible roles):
          reference -> <a id="{refId}" href="#fn{n}" role="doc-noteref"><sup>{n}</sup></a>
          endnotes  -> ONE <section role="doc-endnotes"> appended AFTER
                       all body content (outside heading <section>s),
                       containing <hr> then an <ol> of <li id="fn{n}">;
                       each note's last paragraph gets a trailing backlink
                       <a href="#{refId}" role="doc-backlink">↩</a>.
        First reference to note n has refId `fnref{n}`; a k-th (k>1)
        repeat uses `fnref{n}-{k}` and adds another backlink. The
        backlink glyph is the plain return arrow `↩` (Carve's choice;
        djot appends a variation selector).
      - INLINE FOOTNOTE `^[content]` (pandoc form; a deliberate carve
        extension -- canonical djot has no inline footnotes). A `^`
        IMMEDIATELY before `[` opens an anonymous note whose content is the
        balanced bracket span (escape- and code-span-aware close, same as link
        text). Properties:
          * Content is INLINE-only, parsed recursively with footnote
            recognition DISABLED inside it (no `^[…]` or `[^ref]` nested in a
            note, either direction).
          * Numbered in the SAME document-order sequence as reference notes
            (an inline note always takes a fresh anonymous number; it cannot
            be re-referenced). Renders the same noteref + an `<li id="fn{n}">`
            in the one endnotes section, content as one `<p>` + backlink.
          * A trailing `{attrs}` attaches to the noteref `<a>` (like a
            reference note).
          * Empty or whitespace-only (`^[]`, `^[ ]`) is literal; an unclosed
            `^[…` is literal.
        PRECEDENCE (PART 8): `^[` is ranked ABOVE superscript. Consequences:
        `^[x]^` is a note then a literal `^` (no longer superscript-of-`[x]`;
        escape `\^[x]^` to keep superscript); `^^[x]` is suppressed by
        same-delimiter `^` adjacency (literal); `\^[x]` is literal; a `^` not
        immediately followed by `[` is superscript as before.
      - LIMITATION: a footnote (reference `[^1]` or inline `^[…]`) inside link
        text (`[t[^1]](u)`) or inside a heading later cloned by a `</#id>`
        crossref nests an <a> in an <a>; avoid footnotes in those positions.

   17. TIGHT vs LOOSE LISTS -- NORMATIVE (governs `list` rendering). A list is
      LOOSE if any item is followed by a blank line before the next sibling
      marker, OR any item holds a blank-line-separated second PARAGRAPH;
      otherwise TIGHT. A tight item's paragraph renders WITHOUT a `<p>`
      (`<li>text</li>`); a loose item's paragraphs ARE wrapped
      (`<li><p>text</p></li>`).

      COMPACT LIST BLOCKS -- Carve deviation from djot/CommonMark. A blank line
      before an item's sub-BLOCK (sub-list, block quote, fenced code, fenced
      div, heading, table) does NOT loosen the list: the item stays tight, lead
      text inline, block attached. Only a genuine second PARAGRAPH (blank +
      indented plain prose), or a blank line between items, loosens. The blank
      line is still required to START the block, so block recognition and the
      uniformity principle are unchanged -- only the tight/loose RENDERING
      differs. djot renders these loose; Carve renders them tight. Pinned by
      corpus compact-list-blocks.

      LIST CONTINUATION MARKER -- Carve addition. A line whose only content is
      `+`, at the item's marker column, attaches the FOLLOWING flush-left block
      to the current item with no blank line, keeping the list tight (useful for
      code/tables you would rather not indent). `+` is not a Carve bullet (PART 9
      §11), so a lone `+` is unambiguous; outside a list a lone `+` is literal
      text. It attaches one block, up to the next blank line, sibling item, or a
      further `+`. The same marker may sit on the marker line itself (`- +`) to
      open an item whose body is the flush-left block(s) that follow, with no
      inline lead -- the FIRST-BLOCK form (a bare `+`; `- + text` keeps `+ text`
      as literal text). Pinned by corpus list-continuation-marker.

   18. MATH -- NORMATIVE (governs `math` in PART 3; djot form). Inline
      `$` + verbatim backtick span; display `$$` + verbatim backtick
      span. A `$` NOT immediately followed by a backtick run is literal
      (`$5` is currency); `\$` escapes a literal `$`. Rendering:
        inline  -> <span class="math inline">\(content\)</span>
        display -> <span class="math display">\[content\]</span>
      `content` is the verbatim span text, HTML-escaped (`&`,`<`,`>`).
      Display math alone on a line renders inside its own paragraph. A
      standalone display-math block carrying a trailing caption (PART 9 §4)
      is a numbered EQUATION: the math paragraph is wrapped in a figure and a
      `</#id>` to it resolves to "Equation N" (§19).
      Author `{attrs}` after the span merge onto the <span> (the base
      `math …` class is kept; see PART 10 §1).

   19. CROSSREF AUTO-TEXT + DEFAULT-ON / PROCESSOR FEATURES -- NORMATIVE.
      - `</#id>` clones the target heading's full inline children, so a
        heading with markup (`# *Setup*`) yields a link that KEEPS the
        markup (`<a href="#…"><strong>Setup</strong> …</a>`).
      - `</#id>` to a NUMBERED CAPTION (a figure / table / listing / equation
        whose caption carries a number placeholder and an `{#id}`) yields the
        caption's LABEL + NUMBER ("Figure 1"), markup preserved, NOT the
        caption prose. A `</#id>` to an id that is neither a heading nor a
        numbered caption stays unresolved (renders literal), as today.
      - Mentions (`@user`), tags (`#tag`) and smart typography are ON by
        default in the conformant core; a processor MAY disable them. The
        corpus pins the default-on behavior.
      - Includes (`{{ path }}`) are a PROCESSOR-LEVEL directive, NOT part
        of the core parser; a conformant core MAY leave `{{ … }}` literal.
        An implementing processor MUST forbid path traversal outside the
        project root, bound recursion depth, and treat includes as opt-in.

   20. RAW PASSTHROUGH -- NORMATIVE (governs `raw_inline` in PART 3 and the
      semantics of `raw_block` in PART 2). The BLOCK form (```=FORMAT
      fence, PART 2) emits its verbatim content UNESCAPED when FORMAT
      matches the output format and drops it otherwise; the inline form
      follows. Block and inline raw share the `=FORMAT` spelling (```=html
      / `…`{=html}); the former ```raw FORMAT keyword form was removed. A code span whose trailing attribute block
      is EXACTLY `{=format}` (a single `=`-prefixed format name, with no
      other classes/ids/keys) is raw passthrough: the verbatim span content
      is emitted UNESCAPED when `format` matches the output format (`html`),
      and DROPPED otherwise. The `{=format}` tag is consumed, not rendered.
      A code span with any OTHER trailing `{…}` is a generic attributed code
      span (PART 3 `code_span` note), not raw inline. The corpus pins this
      (tests/corpus/50-raw-inline):
        `<br>`{=html}   -> <br>            (emitted verbatim)
        `\foo`{=latex}  ->                (dropped in HTML output)

   21. TRAILING LINE COMMENTS -- NORMATIVE (governs `inline_comment` in PART 3).
      - A `%%` token is a trailing comment marker only when the character
        immediately before it is whitespace (space or tab) or it starts the
        inline run (beginning of paragraph content after leading whitespace).
      - When recognized, the `%%` and ALL remaining characters up to (but not
        including) the line break are consumed and produce no output.
      - `%%` is NEVER recognized inside a code span or raw inline (PART 9 §20);
        those contexts pass both `%` characters through verbatim.
      - `\%%` (escaped first percent) is literal -- `%%` is literal text, not a
        comment marker.
      - Without preceding whitespace (e.g. `50%%` or `a%%b`) the `%%` is
        literal -- percentages and doubled-percent tokens in prose are safe.
      - The comment does NOT cross a line break; a soft-wrapped continuation on
        the next line of the same paragraph is unaffected (corpus 46-comments-6).

   22. FORCED INTRAWORD EMPHASIS -- NORMATIVE (governs the forced_* productions
      in PART 3). The brace-pair family is the escape hatch for emphasis where a
      bare delimiter would otherwise stay literal (§9 word boundary).
      - FORMS. Each emphasis mark has a forced form that renders the SAME
        element as its bare delimiter:
          {/x/}   -> <em>x</em>        {*x*}   -> <strong>x</strong>
          {_x_}   -> <u>x</u>          {~x~}   -> <s>x</s>
          {^x^}   -> <sup>x</sup>      {,x,}   -> <sub>x</sub>
          {=x=}   -> <mark>x</mark>
      - NO WORD BOUNDARY. A forced span opens and closes regardless of the
        surrounding characters, so it emphasizes intraword: foo{*bar*}baz ->
        foo<strong>bar</strong>baz; my{_path_}name -> my<u>path</u>name. This
        is the ONLY way to emphasize intraword.
      - BOUNDS = THE BRACES. The closing `X}` ends the span; a bare delimiter
        of the same kind INSIDE is literal content, like the bare-span rule
        (§9): {/a/b/} -> <em>a/b</em>. Inner content otherwise parses normally,
        so cross-type nesting works: {/italic *bold*/} -> <em>italic
        <strong>bold</strong></em>.
      - `{~ … ~}` DISAMBIGUATION. The brace pair is editorial SUBSTITUTION when
        it contains a top-level `~>` ({~old~>new~} -> <del>old</del><ins>new
        </ins>), and forced STRIKETHROUGH otherwise ({~old~} ->
        <s>old</s>). The `~>` test is the discriminator.
      - `{= … =}` is forced highlight (<mark>), distinct from the raw-inline
        `{=format}` attribute on a code span (`` `x`{=html} ``), which has no
        trailing `=` before the `}`.
      - ESCAPING. A literal `{/` (etc.) is written `\{/` -- one backslash on the
        brace is enough; the inner delimiter need not be escaped.
      - ATTRIBUTES. A forced span MAY carry a trailing `[attributes]` block like
        any bare span: {*x*}{.c} -> <strong class="c">x</strong>.
      (corpus 01-emphasis-12 forced-intraword, 01-emphasis-13 forced-nesting.)

   23. LINE BLOCK (VERSE) -- NORMATIVE (governs `line_block` in PART 2). A
      `::: |` fenced block preserves the author's per-line layout.
      - TRIGGER. A bare pipe `|` TYPE TOKEN on the opener (`::: |`) selects this
        behavior. The token is a pipe, not a word. STRICT (djot): the opener
        carries no inline attributes, so the `::: {.line-block}` class form is an
        ordinary PARAGRAPH, not a line block -- the class alone does not trigger it.
      - WRAPPER. Renders as `<div class="line-block">` (a generic div, never an
        `<aside>`); extra classes / id attach via a PRECEDING block-attribute
        line, which floats onto the div (§15).
      - LINE BREAKS. Each soft line break inside a stanza becomes a HARD break
        (`<br>`). So consecutive non-blank body lines render as one paragraph
        with `<br>` between them.
      - STANZAS. A blank line ends a stanza; each stanza is its own `<p>` inside
        the single `<div class="line-block">`.
      - LEADING WHITESPACE. Each body line's leading whitespace is PRESERVED
        (unlike ordinary paragraph content, which is trimmed). In HTML each
        leading space serializes as a NBSP (`&nbsp;`, U+00A0), so indentation is
        visible with no external CSS (matching Pandoc). Plain-text / ANSI
        renderers emit the literal spaces. A tab in the leading run follows the
        tab-stop rule (PART 9 §24).
      - REFERENCE COLUMN. Leading whitespace is measured RELATIVE TO THE FENCE
        indent, not absolute column 0: a `line-block` nested in a list has the
        container's structural indent stripped first (the list dedents to its
        content column), so only the author's intra-verse indent survives. The
        `+` first-block form (fence at column 0) is the canonical way to nest a
        `line-block` without the container consuming indentation.
      - INLINE CONTENT parses normally (emphasis, links, code, ...); only the
        whitespace and soft-break handling differ from an ordinary div.
      (corpus 88-line-blocks*.)

   24. TABS AND INDENTATION -- NORMATIVE (governs `indent` in PART 2 and all
      indentation-sensitive comparisons). Adjudicated with the tab-stop work
      (carve #99 / Rule B carve #103); shipped in all three implementations.
      - COLUMNS, NOT CHARACTERS. Indentation is measured in VISUAL COLUMNS:
        a tab advances to the next multiple of 4 (CommonMark tab stops), a
        space advances by 1. A leading tab therefore indents to column 4.
        Outside indentation (code content, inline text) tabs are PRESERVED
        verbatim and never expanded (PART 2, TABS IN CODE).
      - MARKER SEPARATOR IS A SPACE. The required separator after a list
        marker is a single SPACE character -- a syntax delimiter, not
        indentation. `-<TAB>a` is NOT a list item (paragraph text).
      - CONTENT COLUMN. An item's content column is the visual column where
        its content starts (marker width + separator: `- ` -> 2, `1. ` -> 3,
        `10. ` -> 4). An ORDERED child marker must reach the parent item's
        content column to nest; below it, the line folds as lazy item text
        (ordered markers never interrupt, §10). UNORDERED/TASK bullets and
        every other block opener (quote, heading, fence, table, ...) nest at
        ANY indent past the parent item's base column.
      - RULE B (BULLETS AT ANY INDENT). A bullet (`-`/`*` + space + content)
        opens a list at ANY indentation, uniformly in every context: at the
        top level, interrupting an open paragraph (§10), and nested in a
        list item. There is no column-0 restriction.
      - DEDENT. When nested content is re-parsed, the container's structural
        indent is stripped by columns. A tab straddling the dedent boundary
        is consumed WHOLE on non-marker lines (Carve has no indent-sensitive
        block below list nesting, so the residual columns never matter);
        on a LIST-MARKER line the straddling tab's unconsumed columns are
        re-inserted as spaces, so sibling markers aligned with mixed
        tab/space indentation keep the same visual column and stay siblings.
*)


(* ============================================================================
   PART 10: HTML SERIALIZATION CONVENTIONS (NORMATIVE)
   ============================================================================

   The corpus pins exact bytes against the reference (carve-js); these
   conventions state the rules a SECOND implementation (e.g. carve-php)
   must follow to match WITHOUT copying carve-js output. On any
   disagreement the corpus wins.

   1. ATTRIBUTE ORDER. Attributes serialize in the AUTHOR's source order
      (the recorded `order`; see the RENDER ORDER rule in PART 4 and
      PART 9 §12-style note). An Attrs built programmatically with no
      recorded order falls back to a fixed `class`, then `id`, then
      key=value order. All classes accumulate (space-joined) into one
      `class` slot at its first-appearance position. An element with no
      effective attributes emits none. (Heading ids move to <section>,
      §13.) A mandatory base class (math `math inline|display`) is
      prepended to that class slot.

   2. ATTRIBUTE QUOTING + ESCAPING. Attribute values are double-quoted.
      In an attribute value escape `&`->&amp;, `<`->&lt;, `>`->&gt;,
      `"`->&quot;, `'`->&apos;. In text/content escape `&`,`<`,`>` (NOT
      quotes). No other entities are produced.

   3. VOID ELEMENTS are written WITHOUT a trailing slash: `<hr>`,
      `<img …>`, `<br>`. A hard break serializes as `<br>` + newline.

   4. BLOCK WHITESPACE. Block elements are newline-separated, one per
      line. Nested block structures (list, blockquote, table, admonition,
      div, figure, section, endnotes) indent their children by TWO spaces
      per nesting level. Inline content stays flat on its block's line.

   5. PARAGRAPH WRAPPING follows the tight/loose rule (§17): a tight list
      item's text has no <p>; a loose item's paragraphs are wrapped.

   6. CODE. A fenced code block is `<pre><code[ class="language-X"]>` +
      escaped content + a trailing newline + `</code></pre>`. An inline
      code span is `<code>` + escaped content + `</code>`.

   7. TABLES. Alignment renders as `style="text-align: VALUE;"` (one
      space after the colon, trailing semicolon); a cell with no effective
      alignment emits NO style attribute. `rowspan`/`colspan` are emitted
      only when their count is > 1.

   8. STABLE STRINGS. Mentions and tags render as inert `<span>`s (no link)
      when no URL template is configured; a configured template links to them. The footnote
      backlink glyph is `↩` (§16). Math wraps content in `\(…\)` /
      `\[…\]` inside `<span class="math inline|display">` (§18).
*)

Released under the MIT License.