Sphinx plugin for Antlr4

A4Doc is a sphinx plugin for documenting Antlr4 grammars.

It’s primary target is to provide some overview for DSL users (generated documentation may not include some nuances essential for compiler developers).

See an example output: Json.

Features

  • A new domain with grammar and rule directives called a4.

  • Directives for rendering railroad diagrams, such as this one:

    parserlexer grammaridentifier;

  • Directive for extracting documentation comments and rendering docs and diagrams from .g4 source files.

Requirements

A4Doc uses dataclasses to represent parsed antlr files, thus python >= 3.7 is required. Also, this extension requires sphinx >= 1.8.0 because it relies on some features added in that release.

Installation

Install sphinx-a4doc with pip:

pip3 install sphinx-a4doc

Add sphinx_a4doc to the list of extensions in your conf.py. If you intend to generate documentation from sources, also specify the location of your grammar files:

extensions = [
    'sphinx_a4doc',
]

# Assuming conf.py is in project/docs/source/conf.py
# and grammars are in project/project/grammars
from os.path import dirname
a4_base_path = dirname(__file__) + '/../../project/grammars'

Quickstart

Use the a4:grammar directive to declare a new grammar. Within the grammar block, use the a4:rule to declare a new rule:

.. a4:grammar:: MyGrammar

   A grammar for my DSL.

   .. a4:rule:: root

      The root grammar rule.

The above code produces this output:

grammar MyGrammar

A grammar for my DSL.

rule root

The root grammar rule.

Use a4:grammar (or a4:g as a shortcut) or a4:rule (or a4:r) roles to refer the declared grammars and rules:

Grammar :a4:g:`MyGrammar` has a root rule :a4:r:`MyGrammar.root`.

The above code produces this output:

Grammar MyGrammar has a root rule MyGrammar.root.

Use railroad-diagram, lexer-rule-diagram and parser-rule-diagram directives to render diagrams:

.. parser-rule-diagram:: 'def' ID '(' (arg (',' arg)*)? ')' ':'

The above code produces this output:

defID(arg,):

Use a4:autogrammar directive to generate documentation from a grammar file.

RST reference

Declaring objects

.. a4:grammar:: name

Declare a new grammar with the given name.

Grammar names should be unique within the project.

Options:

:name: <str>

Specifies a human-readable name for the grammar.

If given, the human-readable name will be rendered instead of the primary grammar name. It will also replace the primary name in all cross references.

For example this code:

.. a4:grammar:: PrimaryName
   :name: Human-readable name

will render the next grammar description:

grammar Human-readable name
:type: mixed|lexer|parser

Specifies a grammar type. The type will be displayed in the grammar signature.

For example these three grammars:

.. a4:grammar:: Grammar1

.. a4:grammar:: Grammar2
   :type: lexer

.. a4:grammar:: Grammar3
   :type: parser

will be rendered differently:

grammar Grammar1
lexer grammar Grammar2
parser grammar Grammar3
:imports: <str>[, <str>[, ...]]

Specifies a list of imported grammars.

This option affects name resolution process for rule cross-references. That is, if there is a reference to grammar.rule and there is no rule found in the grammar, the imported grammars will be searched as well.

Note that this setting is not passed through intersphinx.

:noindex:

A standard sphinx option to disable indexing for this rule.

:diagram-*:

One can override any option for all railroad diagrams within this grammar. Prefix the desired option with diagram- and add to the rule description.

For example:

.. a4:grammar:: Test
   :diagram-end-class: complex

   All diagrams rendered inside this grammar
   will have 'end-class' set to 'complex'.
.. a4:rule:: name

Declare a new production rule with the given name.

If placed within an a4:grammar body, the rule will be added to that grammar. It can then be referenced by a full path which will include the grammar name and the rule name concatenated with a dot symbol.

If placed outside any grammar directive, the rule will be added to an implicitly declared “default” grammar. In this case, the rule’s full path will only include its name.

In either case, the rule name should be unique within its grammar.

Options:

:name: <str>

Specifies a human-readable name for this rule. Refer to the corresponding a4:grammar’s option for more info.

:noindex:

A standard sphinx option to disable indexing for this rule.

:diagram-*:

One can override any option for all railroad diagrams within this rule. Refer to the corresponding a4:grammar’s option for more info.

Cross-referencing objects

:any:

All a4 objects can be cross-referenced via the any role.

If given a full path, e.g. :any:`grammar_name.rule_name`, any will search a rule called rule_name in the grammar called grammar_name and then, should this search fail, in all grammars that are imported from grammar_name, recursively.

If given a relative path, e.g. :any:`name`, any will perform a global search for a rule or a grammar with the corresponding name.

:a4:grammar:
:a4:g:

Cross-reference a grammar by its name.

There’s nothing special about this role, just specify the grammar name.

:a4:rule:
:a4:r:

Cross-reference a grammar by its name or full path.

If given a full path, e.g. :a4:r:`grammar_name.rule_name`, the rule will be first searched in the corresponding grammar, then in all imported grammars, recursively.

If given a rule name only, e.g. :a4:r:`rule_name`, the behavior depends on context:

  • when used in a grammar declaration body, the rule will be first searched in that grammar, then in any imported grammar, and at last, in the default grammar.

  • when used without context, the rule will only be searched in the default grammar.

Prepending full path with a tilde works as expected.

Rendering diagrams

.. railroad-diagram::

This is the most flexible directive for rendering railroad diagrams. Its content should be a valid YAML document containing the diagram item description.

The diagram item description itself has a recursive definition. It can be one of the next things:

  • None (denoted as tilde in YAML) will produce a line without objects:

    .. railroad-diagram:: ~
    

  • a string will produce a terminal node:

    .. railroad-diagram:: just some string
    

    just some string

  • a list of diagram item descriptions will produce these items rendered one next to another:

    .. railroad-diagram::
    
       - terminal 1
       - terminal 2
    

    terminal 1terminal 2

  • a dict with stack key produces a vertically stacked sequence.

    The main value (i.e. the one that corresponds to the stack key) should contain a list of diagram item descriptions. These items will be rendered vertically:

    .. railroad-diagram::
    
       stack:
       - terminal 1
       -
         - terminal 2
         - terminal 3
    

    terminal 1terminal 2terminal 3

  • a dict with choice key produces an alternative.

    The main value should contain a list of diagram item descriptions:

    .. railroad-diagram::
    
       choice:
       - terminal 1
       -
         - terminal 2
         - terminal 3
    

    terminal 1terminal 2terminal 3

  • a dict with optional key will produce an optional item.

    The main value should contain a single diagram item description.

    Additionally, the skip key with a boolean value may be added. If equal to true, the element will be rendered off the main line:

    .. railroad-diagram::
    
       optional:
       - terminal 1
       - optional:
         - terminal 2
         skip: true
    

    terminal 1terminal 2

  • a dict with one_or_more key will produce a loop.

    The one_or_more element of the dict should contain a single diagram item description.

    Additionally, the repeat key with another diagram item description may be added to insert nodes to the inverse connection of the loop.

    .. railroad-diagram::
    
       one_or_more:
       - terminal 1
       - terminal 2
       repeat:
       - terminal 3
       - terminal 4
    

    terminal 1terminal 2terminal 4terminal 3

  • a dict with zero_or_more key works like one_or_more except that the produced item is optional:

    .. railroad-diagram::
    
       zero_or_more:
       - terminal 1
       - terminal 2
       repeat:
       - terminal 3
       - terminal 4
    

    terminal 1terminal 2terminal 4terminal 3

  • a dict with node key produces a textual node of configurable shape.

    The main value should contain text which will be rendered in the node.

    Optional keys include href, css_class, radius and padding.

    .. railroad-diagram::
    
       node: go to google
       href: https://www.google.com/
       css_class: terminal
       radius: 3
       padding: 50
    
  • a dict with terminal key produces a terminal node.

    It works exactly like node. The only optional key is href.

  • a dict with non_terminal key produces a non-terminal node.

    It works exactly like node. The only optional key is href.

  • a dict with comment key produces a comment node.

    It works exactly like node. The only optional key is href.

Example:

This example renders a diagram from the features section:

.. railroad-diagram::
   - choice:
     - terminal: 'parser'
     -
     - terminal: 'lexer '
     default: 1
   - terminal: 'grammar'
   - non_terminal: 'identifier'
   - terminal: ';'

which translates to:

parserlexer grammaridentifier;

Customization:

See more on how to customize diagram style in the ‘Customizing diagram style’ section.

Options:

:padding: <int>, <int>, <int>, <int>

Array of four positive integers denoting top, right, bottom and left padding between the diagram and its container. By default, there is 1px of padding on each side.

:vertical-separation: <int>

Vertical space between diagram lines.

:horizontal-separation: <int>

Horizontal space between items within a sequence.

:arc-radius: <int>

Arc radius of railroads. 10px by default.

:translate-half-pixel:
:no-translate-half-pixel:

If enabled, the diagram will be translated half-pixel in both directions. May be used to deal with anti-aliasing issues when using odd stroke widths.

:internal-alignment: center|left|right|auto-left|auto-right

Determines how nodes aligned within a single diagram line. Available options are:

  • center – nodes are centered.

    ABCDE,

  • left – nodes are flushed to left in all cases.

    ABCDE,

  • right – nodes are flushed to right in all cases.

    ABCDE,

  • auto_left – nodes in choice groups are flushed left, all other nodes are centered.

    ABCDE,

  • auto_right – nodes in choice groups are flushed right, all other nodes are centered.

    ABCDE,

:character-advance: <float>

Average length of one character in the used font. Since SVG elements cannot expand and shrink dynamically, length of text nodes is calculated as number of symbols multiplied by this constant.

:end-class: simple|complex

Controls how diagram start and end look like. Available options are:

  • simple – a simple T-shaped ending.

    X

  • complex – a T-shaped ending with vertical line doubled.

    X

:max-width: <int>

Max width after which a sequence will be wrapped. This option is used to automatically convert sequences to stacks. Note that this is a suggestive option, there is no guarantee that the diagram will fit to its max_width.

:literal-rendering: name|contents|contents-unquoted

Controls how literal rules (i.e. lexer rules that only consist of one string) are rendered. Available options are:

  • name – only name of the literal rule is displayed.

  • contents – quoted literal string is displayed.

    'def'Id

  • contents-unquoted: – literal string is displayed, quotes stripped away.

    defId

:cc-to-dash:
:no-cc-to-dash:

If rule have no human-readable name set, convert its name from CamelCase to dash-case.

:alt: <str>

If rendering engine does not support output of contents, specified string is used alternatively.

.. lexer-rule-diagram::

The body of this directive should contain a valid Antlr4 lexer rule description.

For example

.. lexer-rule-diagram:: ('+' | '-')? [1-9] [0-9]*

translates to:

'+''-'[1-9][0-9]

Options:

Options are inherited from the railroad-diagram directive.

.. parser-rule-diagram::

The body of this directive should contain a valid Antlr4 parser rule description.

For example

.. parser-rule-diagram::

   SELECT DISTINCT?
   ('*' | expression (AS row_name)?
          (',' expression (AS row_name)?)*)

translates to:

SELECTDISTINCT*expressionASrow_name,

Options:

Options are inherited from the railroad-diagram directive.

Autodoc directive

.. a4:autogrammar:: filename

Autogrammar directive generates a grammar description from a .g4 file.

Its only argument, name, should contain path of the grammar file relative to the a4_base_path. File extension may be omitted.

Autogrammar will read a .g4 file and extract grammar name (which will be used for cross-referencing), grammar-level documentation comments, set of production rules, their documentation and contents. It will then generate railroad diagrams and render extracted information.

See more on how to write documentation comments and control look of the automatically generated railroad diagrams in the ‘Grammar comments and annotations’ section.

Like autoclass and other default autodoc directives, autogrammar can have contents on its own. These contents will be merged with the automatically generated description.

Use docstring-marker and members-marker to control merging process.

Options:

:name:
:type:
:imports:
:noindex:
:diagram-*:

Inherited from a4:grammar directive.

If not given, :type: and :imports: will be extracted from grammar file.

:only-reachable-from: <str>

If given, autodoc will only render rules that are reachable from this root. This is useful to exclude rules from imported grammars that are not used by the primary grammar.

The value should be either name of a rule from the grammar that’s being documented or a full path which includes grammar name and rule name.

For example, suppose there’s Lexer.g4 and Parser.g4. To filter lexer rules that are not used by parser grammar, use:

.. a4:autogrammar:: Parser
   :only-reachable-from: Parser.root

.. a4:autogrammar:: Lexer
   :only-reachable-from: Parser.root
:mark-root-rule:
:no-mark-root-rule:

If enabled, automatic diagram for the rule that’s listed in only-reachable-from will use complex line endings (see the end-class option of the railroad-diagram directive).

:lexer-rules:
:no-lexer-rules:

Controls whether lexer rules should appear in documentation. Enabled by default.

:parser-rules:
:no-parser-rules:

Controls whether parser rules should appear in documentation. Enabled by default.

:fragments:
:no-fragments:

Controls whether fragments should appear in documentation. Disabled by default.

:undocumented:
:no-undocumented:

Controls whether undocumented rules should appear in documentation. Disabled by default.

:grouping: mixed|lexer-first|parser-first

Controls how autodoc groups rules that are extracted from sources.

  • mixed – there’s one group that contain all rules.

  • lexer-first – there are two group: one for parser rules and one for lexer rules and fragments. Lexer group goes first.

  • parser-first – like lexer-first, but parser group preceeds lexer group.

:ordering: by-source|by-name

Controls how autodoc orders rules within each group (see grouping option).

  • by-source – rules are ordered as they appear in the grammar file.

  • by-name – rules are ordered lexicographically.

:honor-sections:
:no-honor-sections:

If true, render comments that start with a triple slash, treating them as paragraphs that placed between rules.

This setting has no effect unless ordering is by-source.

New in version 1.2.0.

:cc-to-dash:
:no-cc-to-dash:

For rules without explicit human-readable names, generate ones by converting rule name from CamelCase to dash-case.

Setting this option will also set the diagram-cc-to-dash option, unless the latter is specified explicitly.

.. a4:autorule:: filename rulename

Autorule directive renders documentation for a single rule. It accepts two arguments, first is a path to the grammar file relative to the a4_base_path, second is name of the rule that should be documented.

Note that autorule can only be used when within a grammar definition. Name of the current grammar definition must match name of the grammar from which the documented rule is imported.

Options:

:name:
:noindex:
:diagram-*:

Inherited from a4:rule directive.

.. docstring-marker::

This marker allows customizing where grammar docstring will be rendered.

By default, grammar docstring (i.e., the comment at the very top of a grammar file) will be added to the end of the autogrammar directive. However, if there is a docstring marker present, grammar docstring will be rendered on its place.

Example:

.. a4:autogrammar:: Json

   (1) This is the description of the grammar.

   .. docstring-marker::

   (2) This is the continuation of the description.

In this case, the grammar docstring will be rendered between (1) and (2).

.. members-marker::

This marker allows customizing where rule descriptions will be rendered.

See docstring-marker.

Grammar comments and annotations

The a4:autogrammar directive does not parse any comment that’s found in a grammar file. Instead, it searches for ‘documentation’ comments, i.e. ones specially formatted. There are three types of such comments:

  • documentation comments are multiline comments that start with /** (that is, a slash followed by double asterisk). These comments should contain valid rst-formatted text.

    It is common to outline documentation comments by adding an asterisk on each row. Though this is completely optional, a4doc can recognize and handle this pattern.

    Example:

    /**
     * This is the grammar root.
     */
    module: moduleItem* EOF
    
  • control comments are inline comments that start with //@. Control comments contain special commands that affect rendering process.

    Example:

    //@ doc:no-diagram
    module: moduleItem* EOF
    
  • section comments are comments that start with ///. They’re used to render text between production rules and split grammar definition in sections.

    Example:

    /// **Module definition**
    ///
    /// This paragraph describes the ``Module definition``
    /// section of the grammar.
    
    module: moduleItem* EOF
    
    moduleItem: import | symbol
    

    New in version 1.2.0.

There are also restrictions on were documentation and control comments may appear:

  • documentation comments can be placed either at the beginning of the file, before the grammar keyword (in which case they document the whole grammar), or they can be found right before a production rule or a fragment declaration (in which case they are rendered as a rule description). Also, they can be embedded into the rule description, in which case they are rendered as part of the railroad diagram;

  • control comments can only be placed before a production rule declaration. They only affect rendering of that specific production rule;

  • multiple documentation and control comments can appear before a rule. In this case, the first documentation comment will be rendered before automatically generated railroad diagram, all sequential documentation comments will be rendered after it, and all control comments will be applied before rendering documentation comments;

  • section comments can only be placed between rules in the main section of a file.

Control comments

The list of control comments includes:

  • //@ doc:nodoc – exclude this rule from autogrammar output.

  • //@ doc:inline – exclude this rule from autogrammar output; any automatically generated railroad diagram that refer this rule will include its contents instead of a single node.

    Useful for fragments and simple lexer rules.

    For example

    NUMBER
        : '-'? ('0' | [1-9] [0-9]*) ('.' [0-9]+)? EXPONENT?
        ;
    
    //@ doc:inline
    fragment EXPONENT
        : ('e' | 'E')? ('+' | '-')? [0-9]+
        ;
    

    will produce the number rule (note how exponent is rendered inside of the number diagram).

  • //@ doc:no-diagram – do not generate railroad diagram.

  • //@ doc:importance <int> – controls the ‘importance’ of a rule.

    By default, all rules have importance of 1.

    Rules with importance of 0 will be rendered off the main line in optional groups:

    Rule with importance 1Rule with importance 0

    In alternative groups, rule with the highest importance will be centered:

    Rule with importance 0Rule with importance 1Rule with importance 2Rule with importance 1

  • //@ doc:unimportant – set importance to 0.

  • //@ doc:name <str> – set a human-readable name for this rule. See a4:rule:name option.

  • //@ doc:css-class – add a custom CSS class to all diagrams referencing this rule.

    New in version 1.5.0.

Configuration

Customizing diagram style

To customize diagram style, one can replace the default css file by placing a a4_railroad_diagram.css file to the _static directory.

New in version 1.6.0: to customise how diagrams look in latex build, place a a4_railroad_diagram_latex.css file to the _static directory.

Example output

This example was generated from Json.g4.

grammar Json

JSON (JavaScript Object Notation) is a lightweight data-interchange format.

rule value

On top level, JSON consists of a single value. That value can be either a complex structure (such as an object or an array) or a primitive type (a string in double quotes, a number, or true or false or null).

objectarraystringnumbertruefalsenull

rule object

Object is a collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.

(string:value,)

rule array

Array is an ordered list of values. In most languages, this is realized as vector, list, array or sequence.

[value,]

rule number

A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.

'-''0'[1-9][0-9]'.'[0-9]'e''E''+''-'[0-9]

rule string

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.

"Any unicode character except " and \\"quotation mark\reverse solidus/solidusbbackspacefformfeednnewlinercarriage returnthorizontal tabu4 hexdecimal digits"

Indices and tables