Generating documentation for grammars

You can automatically generate documentation for a grammar. We support ANTLR 4 (.g4) and YACC/Bison (.y) formats.

Autodoc directive

.. syntax:autogrammar:: <path>

This directive parses the given grammar file and automatically generates documentation for it. The path is relative to syntax_base_path.

autogrammar supports all options from syntax:grammar, as well as some additional ones:

:root-rule: <rule> | <grammar>.<rule> | <path> <rule>

If given, autogrammar will only render rules that are reachable from this root. This is useful to exclude rules from imported grammars that are not used by the primary grammar.

The value should be either name of a rule from the grammar that’s being documented, a grammar name and a rule name separated by a dot, or a grammar file and a rule name separated by a space.

For example, suppose we’re documenting ANTLR grammars Lexer.g4 and Parser.g4. To filter lexer rules that are not used by parser grammar, use:

.. syntax:autogrammar:: Parser.g4
    :root-rule: Parser.root

.. syntax:autogrammar:: Lexer.g4
    :root-rule: Parser.root
:mark-root-rule:
:no-mark-root-rule:

If enabled, automatic diagram for the root-rule will use complex line endings, while other diagrams will use simple ones (see end-class).

:diagrams:
:no-diagrams:

Enable automatic generation of syntax diagrams for all documented rules.

:cc-to-dash:
:no-cc-to-dash:

For rules without explicit human-readable names, generate ones by converting rule name from CamelCase or snake_case to dash-case.

:lexer-rules:
:no-lexer-rules:

Controls whether lexer rules should appear in documentation. Enabled by default.

:parser-rules:
:no-parser-rules:

Controls whether parser rules should appear in documentation. Enabled by default.

:fragments:
:no-fragments:

Controls whether fragments should appear in documentation (for formats that support them). Disabled by default.

:undocumented:
:no-undocumented:

Controls whether undocumented rules should appear in documentation. Disabled by default.

:honor-sections:
:no-honor-sections:

If true, render section comments, treating them as paragraphs placed between rules.

This setting has no effect unless ordering is by-source.

:bison-c-char-literals:
:no-bison-c-char-literals:

If true, Bison parser will expect C-like char literals or Rust-like lifetimes when parsing inline code in grammar files. Otherwise, it will expect single-quoted strings.

:grouping: mixed | lexer-first | parser-first

Controls how autogrammar groups rules that are extracted from sources.

  • mixed – there’s one group that contain all rules.

  • lexer-first – there are two group: one for parser rules and one for lexer rules and fragments. Lexer group goes first.

  • parser-first – like lexer-first, but parser group precedes lexer group.

:ordering: by-source | by-name

Controls how autogrammar orders rules within each group (see grouping grouping).

  • by-source – rules are ordered as they appear in the grammar file.

  • by-name – rules are ordered lexicographically.

:literal-rendering: name | contents | contents-unquoted

Controls how literal rules (i.e. lexer rules that only consist of one string) are rendered. Available options are:

  • name – only name of the literal rule is displayed.

  • contents – quoted literal string is displayed.

    'hello\nworld'

  • contents-unquoted – literal string is displayed, quotes stripped away.

    hello\nworld

Grammar comments and annotations

The syntax:autogrammar directive does not parse any comment that’s found in a grammar file. Instead, it searches for ‘documentation’ comments, i.e. ones specially formatted. There are three types of such comments:

  • documentation comments are multiline comments that start with /** (that is, a slash followed by double asterisk). These comments should contain valid rst-formatted text.

    Documentation comments can appear at the top of a file, before production rules, or within them.

    Example:

    /**
     * Documentation for a file.
     */
    
    tokens {
        /**
         * Documentation for an externally-defined token.
         */
        NAME
    }
    
    /**
     * Documentation for a rule.
     */
    argument
        : /** inline comment */ expr
        | /** inline comment */ NAME '=' expr
        ;
    
    /**
     * Documentation for a file.
     */
    
    /**
     * Documentation for an externally-defined token.
     *
     * Also works with `%left`, `%right`, `%nonassoc`, `%precedence`, `%epp`.
     */
    %token NAME
    
    /**
     * You can also provide documentation for a token
     * without telling Bison about it.
     *
     * As far as Bison is concerned, this is just a comment.
     */
    //@ %token '+'
    
    %%
    
    /**
     * Documentation for a rule.
     */
    argument
        : /** inline comment */ expr
        | /** inline comment */ NAME "=" expr
        ;
    
  • control comments are inline comments that start with //@. Control comments contain special commands that affect rendering process.

    They can appear right before a documented object.

    Example:

    tokens {
        //@ doc:content [a-zA-Z_][a-zA-Z0-9_]*
        NAME
    }
    
    //@ doc:inline
    moduleItem
        : declaration
        | statement
        ;
    
    //@ doc:content [a-zA-Z_][a-zA-Z0-9_]*
    %token NAME
    
    %%
    
    //@ doc:inline
    moduleItem
        : import
        | symbol
        ;
    
  • section comments are comments that start with ///. They’re used to render text between production rules and split grammar definition in sections.

    Example:

    /// **Module definition**
    ///
    /// This paragraph describes the ``Module definition``
    /// section of the grammar.
    
    module
        : moduleItem* EOF
        ;
    
    moduleItem
        : import
        | symbol
        ;
    
    /// **Imports**
    ///
    /// This paragraph describes the ``Imports``
    /// section of the grammar.
    
    import
        : 'import' NAME
        ;
    
    %%
    
    /// **Module definition**
    ///
    /// This paragraph describes the ``Module definition``
    /// section of the grammar.
    
    module
        : module moduleItem
        | %empty
        ;
    
    moduleItem
        : import
        | symbol
        ;
    
    /// **Imports**
    ///
    /// This paragraph describes the ``Imports``
    /// section of the grammar.
    
    import
        : "import" NAME
        ;
    

Control comments

The list of control comments includes:

  • //@ doc:no-doc – exclude this rule from autogrammar output.

  • //@ doc:name <str> – set a human-readable name for this rule.

  • //@ doc:inline – exclude this rule from autogrammar output; any automatically generated railroad diagram that uses this rule will include its contents instead of a single node.

    Useful for fragments and simple lexer rules.

  • //@ doc:content <content> – turns token into a literal with the given content. The content must be an ANTLR lexer rule.

    This is useful for tokens that don’t have known contents, such as ones defined in ANTLR’s tokens section or with bison’s %token option.

    Example:

    tokens {
        //@ doc:content [a-zA-Z_][a-zA-Z0-9_]*
        //@ doc:inline
        NAME
    }
    
    import
        : 'import' NAME
        ;
    
    //@ doc:content [a-zA-Z_][a-zA-Z0-9_]*
    //@ doc:inline
    %token NAME
    
    %%
    
    import
        : "import" NAME
        ;
    
    Example output

    With content option:

    NAME

    [a-zA-Z_][a-zA-Z0-9_]

    import

    'import'[a-zA-Z_][a-zA-Z0-9_]

    Without content option:

    NAME
    import

    'import'NAME

  • //@ doc:no-diagram – do not generate railroad diagram.

  • //@ doc:importance <int> – controls the ‘importance’ of a rule.

    By default, all rules have importance of 1.

    Rules with importance of 0 will be rendered off the main line in optional groups. In alternative groups, rule with the highest importance will be centered.

    Example:

    Rule with importance 1Rule with importance 0

    Rule with importance 0Rule with importance 1Rule with importance 2Rule with importance 1

  • //@ doc:unimportant – set importance to 0.

  • //@ doc:keep-diagram-recursive – disable optimizations for recursion when rendering a diagram.

    By default, Sphinx Syntax will try to convert recursive rules into cyclic ones. This works good for normal left recursion, but might generate bad results when using Bison’s precedence declarations.

    Example:

    //@ doc:keep-diagram-recursive
    expr
        : NUMBER
        | expr '*' expr
        | expr '/' expr
        | '-' expr
        | '(' expr ')'
    
    %left '*' '/'
    %precedence NEG
    
    %%
    
    //@ doc:keep-diagram-recursive
    expr
        : NUMBER
        | expr '*' expr
        | expr '/' expr
        | '-' expr %prec NEG
        | '(' expr ')'
    
    Example output

    With keep-diagram-recursive option:

    expr

    NUMBERexpr'*'exprexpr'/'expr'-'expr'('expr')'

    Without keep-diagram-recursive option:

    expr

    NUMBER'-'expr'('expr')''*''/'expr

  • //@ doc:css-class – add a custom CSS class to all diagram nodes referencing this rule.

  • //@ %token <name> – special syntax for declaring a token in Bison grammar without affecting the Bison itself.

    This is useful when you need to declare two tokens with the same precedence and document both of them separately.

    Example:

    /** Documentation for ``*``. */
    //@ %token '*'
    
    /** Documentation for ``/``. */
    //@ %token '/'
    
    %left '*' '/'