Parsing & AST

Part 5: Parsing Rules

5.1 Block Identification (First Pass)

Frontmatter (--- delimited at document start)
Headings (# prefix)
Thematic breaks (---, ***, ___)
Code blocks (` ``` or~~~` fenced)
Block quotes (> prefix)
Lists (-, *, or 1. prefix; + is the list-continuation marker, not a bullet)
Tables (| prefix)
Special blocks (::: delimited)
Paragraphs (everything else)

5.2 Inline Parsing (Second Pass)

Parse in this precedence order:

Escaped characters (\*)
Code spans (`)
Autolinks (<url>)
Links, images, spans ([text](url), ![alt](src), [text]{attrs})
Math ($`…`, $$`…`)
Emphasis markers (bare /, *, _, ~, =; sup/sub braced-only {^ ^} / {, ,})
Smart typography

5.3 The Disambiguation Rule

Literal over markup. A delimiter with no valid match (per the word-boundary conditions) is literal text.
Opener → nearest valid same-type closer. Same-type delimiters between them are literal content (same-type spans do not nest), so /usr/local/ is usr/local, not usrlocal/.
Different-type spans nest, resolved with a delimiter stack in a single left-to-right pass — linear time, no backtracking (Design Principle 1).

This is not "shortest span / earliest opening wins": that rule would truncate /usr/local/ to usr and break nested emphasis. See resources/grammar.ebnf PART 8 and PART 9 §9, and docs/edge-cases.md §1, §8.

5.4 Whitespace Rules

Line ending = soft break (default, configurable)
Blank line = paragraph break
Two+ blank lines = paragraph break with extra space (optional)
Indentation: 2+ spaces for list continuation
Tabs: Normalized to spaces (default: 4)

Part 6: AST Design

6.1 Node Types

Document
├── Frontmatter (optional)
├── Block+
    ├── Heading { level, content, id? }
    ├── Paragraph { content }
    ├── CodeBlock { language?, content, attributes }
    ├── BlockQuote { blocks, attribution? }
    ├── List { type, tight, items }
    │   └── ListItem { blocks, checked? }
    ├── Table { headers, rows, alignment[] }   // alignment[] = column defaults
    │   └── Cell { content, align?, colspan?, rowspan? }   // align? overrides the column default
    ├── ThematicBreak
    ├── Admonition { type, title?, blocks }
    └── RawBlock { format, content }

Inline
├── Text { content }
├── Emphasis { content }           // /text/
├── Strong { content }             // *text*
├── Underline { content }          // _text_
├── Strikethrough { content }      // ~text~
├── Superscript { content }        // {^text^}
├── Subscript { content }          // {,text,}
├── Highlight { content }          // =text=
├── Code { content, language? }
├── Math { content, display }
├── Link { content, url, title? }
├── Image { src, alt, caption? }
├── Footnote { content }
├── SoftBreak
├── HardBreak
└── RawInline { format, content }

(id is optional in the parsed AST but always populated after the identifier-resolution pass — automatic or explicit — so consumers downstream of resolution can treat it as required.)

6.2 Source Mapping

Every node includes:

position: {
  start: { line, column, offset }
  end: { line, column, offset }
}

Parsing & AST ​

Part 5: Parsing Rules ​

5.1 Block Identification (First Pass) ​

5.2 Inline Parsing (Second Pass) ​

5.3 The Disambiguation Rule ​

5.4 Whitespace Rules ​

Part 6: AST Design ​

6.1 Node Types ​

6.2 Source Mapping ​