Sphinx plugin for Antlr4¶
Warning
Sphinx-A4Doc is deprecated, please use Sphinx-Syntax instead.
See migration guide for more info.
A4Doc is a sphinx plugin for documenting Antlr4 grammars.
It’s primary target is to provide some overview for DSL users (generated documentation may not include some nuances essential for compiler developers).
See an example output: Json.
Features¶
A new domain with
grammarandruledirectives calleda4.Directives for rendering railroad diagrams, such as this one:
Directive for extracting documentation comments and rendering docs and diagrams from
.g4source files.
Requirements¶
A4Doc uses dataclasses to represent parsed antlr files, thus python >= 3.7
is required. Also, this extension requires sphinx >= 1.8.0 because it
relies on some features added in that release.
Installation¶
Install sphinx-a4doc with pip:
pip3 install sphinx-a4doc
Add sphinx_a4doc to the list of extensions in your conf.py.
If you intend to generate documentation from sources, also specify the
location of your grammar files:
extensions = [
'sphinx_a4doc',
]
# Assuming conf.py is in project/docs/source/conf.py
# and grammars are in project/project/grammars
from os.path import dirname
a4_base_path = dirname(__file__) + '/../../project/grammars'
Quickstart¶
Use the a4:grammar directive to declare a new grammar.
Within the grammar block, use the a4:rule to declare a new rule:
.. a4:grammar:: MyGrammar
A grammar for my DSL.
.. a4:rule:: root
The root grammar rule.
The above code produces this output:
Use a4:grammar (or a4:g as a shortcut) or
a4:rule (or a4:r) roles to refer the declared
grammars and rules:
Grammar :a4:g:`MyGrammar` has a root rule :a4:r:`MyGrammar.root`.
The above code produces this output:
Grammar
MyGrammarhas a root ruleMyGrammar.root.
Use railroad-diagram, lexer-rule-diagram and
parser-rule-diagram directives to render diagrams:
.. parser-rule-diagram:: 'def' ID '(' (arg (',' arg)*)? ')' ':'
The above code produces this output:
Use a4:autogrammar directive to generate documentation
from a grammar file.
RST reference¶
Declaring objects¶
- .. a4:grammar:: name¶
Declare a new grammar with the given name.
Grammar names should be unique within the project.
Options:
- :name: <str>¶
Specifies a human-readable name for the grammar.
If given, the human-readable name will be rendered instead of the primary grammar name. It will also replace the primary name in all cross references.
For example this code:
.. a4:grammar:: PrimaryName :name: Human-readable name
will render the next grammar description:
- grammar Human-readable name
- :type: mixed|lexer|parser¶
Specifies a grammar type. The type will be displayed in the grammar signature.
For example these three grammars:
.. a4:grammar:: Grammar1 .. a4:grammar:: Grammar2 :type: lexer .. a4:grammar:: Grammar3 :type: parser
will be rendered differently:
- grammar Grammar1
- lexer grammar Grammar2
- parser grammar Grammar3
- :imports: <str>[, <str>[, ...]]¶
Specifies a list of imported grammars.
This option affects name resolution process for rule cross-references. That is, if there is a reference to
grammar.ruleand there is norulefound in thegrammar, the imported grammars will be searched as well.Note that this setting is not passed through intersphinx.
- :noindex:¶
A standard sphinx option to disable indexing for this rule.
- :diagram-*:¶
One can override any option for all
railroad diagramswithin this grammar. Prefix the desired option withdiagram-and add to the rule description.For example:
.. a4:grammar:: Test :diagram-end-class: complex All diagrams rendered inside this grammar will have 'end-class' set to 'complex'.
- .. a4:rule:: name¶
Declare a new production rule with the given name.
If placed within an
a4:grammarbody, the rule will be added to that grammar. It can then be referenced by a full path which will include the grammar name and the rule name concatenated with a dot symbol.If placed outside any grammar directive, the rule will be added to an implicitly declared “default” grammar. In this case, the rule’s full path will only include its name.
In either case, the rule name should be unique within its grammar.
Options:
- :name: <str>¶
Specifies a human-readable name for this rule. Refer to the corresponding
a4:grammar’s option for more info.
- :noindex:¶
A standard sphinx option to disable indexing for this rule.
- :diagram-*:¶
One can override any option for all
railroad diagramswithin this rule. Refer to the correspondinga4:grammar’s option for more info.
Cross-referencing objects¶
- :any:
All
a4objects can be cross-referenced via theanyrole.If given a full path, e.g.
:any:`grammar_name.rule_name`,anywill search a rule calledrule_namein the grammar calledgrammar_nameand then, should this search fail, in all grammars that are imported fromgrammar_name, recursively.If given a relative path, e.g.
:any:`name`,anywill perform a global search for a rule or a grammar with the corresponding name.
- :a4:grammar:¶
- :a4:g:¶
Cross-reference a grammar by its name.
There’s nothing special about this role, just specify the grammar name.
- :a4:rule:¶
- :a4:r:¶
Cross-reference a grammar by its name or full path.
If given a full path, e.g.
:a4:r:`grammar_name.rule_name`, the rule will be first searched in the corresponding grammar, then in all imported grammars, recursively.If given a rule name only, e.g.
:a4:r:`rule_name`, the behavior depends on context:when used in a grammar declaration body, the rule will be first searched in that grammar, then in any imported grammar, and at last, in the default grammar.
when used without context, the rule will only be searched in the default grammar.
Prepending full path with a tilde works as expected.
Rendering diagrams¶
- .. railroad-diagram::¶
This is the most flexible directive for rendering railroad diagrams. Its content should be a valid YAML document containing the diagram item description.
The diagram item description itself has a recursive definition. It can be one of the next things:
None(denoted as tilde in YAML) will produce a line without objects:.. railroad-diagram:: ~
a string will produce a terminal node:
.. railroad-diagram:: just some string
a list of diagram item descriptions will produce these items rendered one next to another:
.. railroad-diagram:: - terminal 1 - terminal 2
a dict with
stackkey produces a vertically stacked sequence.The main value (i.e. the one that corresponds to the
stackkey) should contain a list of diagram item descriptions. These items will be rendered vertically:.. railroad-diagram:: stack: - terminal 1 - - terminal 2 - terminal 3
a dict with
choicekey produces an alternative.The main value should contain a list of diagram item descriptions:
.. railroad-diagram:: choice: - terminal 1 - - terminal 2 - terminal 3
a dict with
optionalkey will produce an optional item.The main value should contain a single diagram item description.
Additionally, the
skipkey with a boolean value may be added. If equal to true, the element will be rendered off the main line:.. railroad-diagram:: optional: - terminal 1 - optional: - terminal 2 skip: true
a dict with
one_or_morekey will produce a loop.The
one_or_moreelement of the dict should contain a single diagram item description.Additionally, the
repeatkey with another diagram item description may be added to insert nodes to the inverse connection of the loop... railroad-diagram:: one_or_more: - terminal 1 - terminal 2 repeat: - terminal 3 - terminal 4
a dict with
zero_or_morekey works likeone_or_moreexcept that the produced item is optional:.. railroad-diagram:: zero_or_more: - terminal 1 - terminal 2 repeat: - terminal 3 - terminal 4
a dict with
nodekey produces a textual node of configurable shape.The main value should contain text which will be rendered in the node.
Optional keys include
href,css_class,radiusandpadding... railroad-diagram:: node: go to google href: https://www.google.com/ css_class: terminal radius: 3 padding: 50
a dict with
terminalkey produces a terminal node.It works exactly like
node. The only optional key ishref.a dict with
non_terminalkey produces a non-terminal node.It works exactly like
node. The only optional key ishref.a dict with
commentkey produces a comment node.It works exactly like
node. The only optional key ishref.
Example:
This example renders a diagram from the features section:
.. railroad-diagram:: - choice: - terminal: 'parser' - - terminal: 'lexer ' default: 1 - terminal: 'grammar' - non_terminal: 'identifier' - terminal: ';'
which translates to:
Customization:
See more on how to customize diagram style in the ‘Customizing diagram style’ section.
Options:
- :padding: <int>, <int>, <int>, <int>¶
Array of four positive integers denoting top, right, bottom and left padding between the diagram and its container. By default, there is 1px of padding on each side.
- :vertical-separation: <int>¶
Vertical space between diagram lines.
- :horizontal-separation: <int>¶
Horizontal space between items within a sequence.
- :arc-radius: <int>¶
Arc radius of railroads. 10px by default.
- :translate-half-pixel:¶
- :no-translate-half-pixel:¶
If enabled, the diagram will be translated half-pixel in both directions. May be used to deal with anti-aliasing issues when using odd stroke widths.
- :internal-alignment: center|left|right|auto-left|auto-right¶
Determines how nodes aligned within a single diagram line. Available options are:
center– nodes are centered.left– nodes are flushed to left in all cases.right– nodes are flushed to right in all cases.auto_left– nodes in choice groups are flushed left, all other nodes are centered.auto_right– nodes in choice groups are flushed right, all other nodes are centered.
- :character-advance: <float>¶
Average length of one character in the used font. Since SVG elements cannot expand and shrink dynamically, length of text nodes is calculated as number of symbols multiplied by this constant.
- :end-class: simple|complex¶
Controls how diagram start and end look like. Available options are:
simple– a simpleT-shaped ending.complex– aT-shaped ending with vertical line doubled.
- :max-width: <int>¶
Max width after which a sequence will be wrapped. This option is used to automatically convert sequences to stacks. Note that this is a suggestive option, there is no guarantee that the diagram will fit to its
max_width.
- :literal-rendering: name|contents|contents-unquoted¶
Controls how literal rules (i.e. lexer rules that only consist of one string) are rendered. Available options are:
name– only name of the literal rule is displayed.contents– quoted literal string is displayed.contents-unquoted: – literal string is displayed, quotes stripped away.
- :cc-to-dash:¶
- :no-cc-to-dash:¶
If rule have no human-readable name set, convert its name from
CamelCasetodash-case.
- :alt: <str>¶
If rendering engine does not support output of contents, specified string is used alternatively.
- .. lexer-rule-diagram::¶
The body of this directive should contain a valid Antlr4 lexer rule description.
For example
.. lexer-rule-diagram:: ('+' | '-')? [1-9] [0-9]*
translates to:
Options:
Options are inherited from the
railroad-diagramdirective.
- .. parser-rule-diagram::¶
The body of this directive should contain a valid Antlr4 parser rule description.
For example
.. parser-rule-diagram:: SELECT DISTINCT? ('*' | expression (AS row_name)? (',' expression (AS row_name)?)*)
translates to:
Options:
Options are inherited from the
railroad-diagramdirective.
Autodoc directive¶
- .. a4:autogrammar:: filename¶
Autogrammar directive generates a grammar description from a
.g4file.Its only argument,
name, should contain path of the grammar file relative to thea4_base_path. File extension may be omitted.Autogrammar will read a
.g4file and extract grammar name (which will be used for cross-referencing), grammar-level documentation comments, set of production rules, their documentation and contents. It will then generate railroad diagrams and render extracted information.See more on how to write documentation comments and control look of the automatically generated railroad diagrams in the ‘Grammar comments and annotations’ section.
Like
autoclassand other default autodoc directives,autogrammarcan have contents on its own. These contents will be merged with the automatically generated description.Use
docstring-markerandmembers-markerto control merging process.Options:
- :name:¶
- :type:¶
- :imports:¶
- :noindex:¶
- :diagram-*:¶
Inherited from
a4:grammardirective.If not given,
:type:and:imports:will be extracted from grammar file.
- :only-reachable-from: <str>¶
If given, autodoc will only render rules that are reachable from this root. This is useful to exclude rules from imported grammars that are not used by the primary grammar.
The value should be either name of a rule from the grammar that’s being documented or a full path which includes grammar name and rule name.
For example, suppose there’s
Lexer.g4andParser.g4. To filter lexer rules that are not used by parser grammar, use:.. a4:autogrammar:: Parser :only-reachable-from: Parser.root .. a4:autogrammar:: Lexer :only-reachable-from: Parser.root
- :mark-root-rule:¶
- :no-mark-root-rule:¶
If enabled, automatic diagram for the rule that’s listed in
only-reachable-fromwill use complex line endings (see theend-classoption of therailroad-diagramdirective).
- :lexer-rules:¶
- :no-lexer-rules:¶
Controls whether lexer rules should appear in documentation. Enabled by default.
- :parser-rules:¶
- :no-parser-rules:¶
Controls whether parser rules should appear in documentation. Enabled by default.
- :fragments:¶
- :no-fragments:¶
Controls whether fragments should appear in documentation. Disabled by default.
- :undocumented:¶
- :no-undocumented:¶
Controls whether undocumented rules should appear in documentation. Disabled by default.
- :grouping: mixed|lexer-first|parser-first¶
Controls how autodoc groups rules that are extracted from sources.
mixed– there’s one group that contain all rules.lexer-first– there are two group: one for parser rules and one for lexer rules and fragments. Lexer group goes first.parser-first– likelexer-first, but parser group preceeds lexer group.
- :ordering: by-source|by-name¶
Controls how autodoc orders rules within each group (see
groupingoption).by-source– rules are ordered as they appear in the grammar file.by-name– rules are ordered lexicographically.
- .. a4:autorule:: filename rulename¶
Autorule directive renders documentation for a single rule. It accepts two arguments, first is a path to the grammar file relative to the
a4_base_path, second is name of the rule that should be documented.Note that autorule can only be used when within a grammar definition. Name of the current grammar definition must match name of the grammar from which the documented rule is imported.
Options:
- .. docstring-marker::¶
This marker allows customizing where grammar docstring will be rendered.
By default, grammar docstring (i.e., the comment at the very top of a grammar file) will be added to the end of the autogrammar directive. However, if there is a docstring marker present, grammar docstring will be rendered on its place.
Example:
.. a4:autogrammar:: Json (1) This is the description of the grammar. .. docstring-marker:: (2) This is the continuation of the description.
In this case, the grammar docstring will be rendered between
(1)and(2).
- .. members-marker::¶
This marker allows customizing where rule descriptions will be rendered.
See
docstring-marker.
Grammar comments and annotations¶
The a4:autogrammar directive does not parse any comment that’s found
in a grammar file. Instead, it searches for ‘documentation’ comments, i.e. ones
specially formatted. There are three types of such comments:
documentation comments are multiline comments that start with
/**(that is, a slash followed by double asterisk). These comments should contain valid rst-formatted text.It is common to outline documentation comments by adding an asterisk on each row. Though this is completely optional, a4doc can recognize and handle this pattern.
Example:
/** * This is the grammar root. */ module: moduleItem* EOF
control comments are inline comments that start with
//@. Control comments contain special commands that affect rendering process.Example:
//@ doc:no-diagram module: moduleItem* EOF
section comments are comments that start with
///. They’re used to render text between production rules and split grammar definition in sections.Example:
/// **Module definition** /// /// This paragraph describes the ``Module definition`` /// section of the grammar. module: moduleItem* EOF moduleItem: import | symbol
Added in version 1.2.0.
There are also restrictions on were documentation and control comments may appear:
documentation comments can be placed either at the beginning of the file, before the
grammarkeyword (in which case they document the whole grammar), or they can be found right before a production rule or a fragment declaration (in which case they are rendered as a rule description). Also, they can be embedded into the rule description, in which case they are rendered as part of the railroad diagram;control comments can only be placed before a production rule declaration. They only affect rendering of that specific production rule;
multiple documentation and control comments can appear before a rule. In this case, the first documentation comment will be rendered before automatically generated railroad diagram, all sequential documentation comments will be rendered after it, and all control comments will be applied before rendering documentation comments;
section comments can only be placed between rules in the main section of a file.
Control comments¶
The list of control comments includes:
//@ doc:nodoc– exclude this rule fromautogrammaroutput.//@ doc:inline– exclude this rule fromautogrammaroutput; any automatically generated railroad diagram that refer this rule will include its contents instead of a single node.Useful for fragments and simple lexer rules.
For example
NUMBER : '-'? ('0' | [1-9] [0-9]*) ('.' [0-9]+)? EXPONENT? ; //@ doc:inline fragment EXPONENT : ('e' | 'E')? ('+' | '-')? [0-9]+ ;
will produce the
numberrule (note how exponent is rendered inside of the number diagram).//@ doc:no-diagram– do not generate railroad diagram.//@ doc:importance <int>– controls the ‘importance’ of a rule.By default, all rules have importance of
1.Rules with importance of
0will be rendered off the main line in optional groups:In alternative groups, rule with the highest importance will be centered:
//@ doc:unimportant– set importance to0.//@ doc:name <str>– set a human-readable name for this rule. Seea4:rule:nameoption.//@ doc:css-class– add a custom CSS class to all diagrams referencing this rule.Added in version 1.5.0.
Configuration¶
Customizing diagram style¶
To customize diagram style, one can replace
the default css file
by placing a a4_railroad_diagram.css file to the _static directory.
Added in version 1.6.0: to customise how diagrams look in latex build,
place a a4_railroad_diagram_latex.css file to the _static directory.
Example output¶
This example was generated from Json.g4.
- grammar Json¶
JSON (JavaScript Object Notation) is a lightweight data-interchange format.
- rule value¶
On top level, JSON consists of a single value. That value can be either a complex structure (such as an
objector anarray) or a primitive type (astringin double quotes, anumber, ortrueorfalseornull).
- rule object¶
Object is a collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- rule array¶
Array is an ordered list of values. In most languages, this is realized as vector, list, array or sequence.
- rule number¶
A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
- rule string¶
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.