Sphinx plugin for Antlr4¶
A4Doc is a sphinx plugin for documenting Antlr4 grammars.
It’s primary target is to provide some overview for DSL users (generated documentation may not include some nuances essential for compiler developers).
See an example output: Json
.
Features¶
A new domain with
grammar
andrule
directives calleda4
.Directives for rendering railroad diagrams, such as this one:
Directive for extracting documentation comments and rendering docs and diagrams from
.g4
source files.
Requirements¶
A4Doc uses dataclasses to represent parsed antlr files, thus python >= 3.7
is required. Also, this extension requires sphinx >= 1.8.0
because it
relies on some features added in that release.
Installation¶
Install sphinx-a4doc
with pip:
pip3 install sphinx-a4doc
Add sphinx_a4doc
to the list of extensions in your conf.py
.
If you intend to generate documentation from sources, also specify the
location of your grammar files:
extensions = [
'sphinx_a4doc',
]
# Assuming conf.py is in project/docs/source/conf.py
# and grammars are in project/project/grammars
from os.path import dirname
a4_base_path = dirname(__file__) + '/../../project/grammars'
Quickstart¶
Use the a4:grammar
directive to declare a new grammar.
Within the grammar block, use the a4:rule
to declare a new rule:
.. a4:grammar:: MyGrammar
A grammar for my DSL.
.. a4:rule:: root
The root grammar rule.
The above code produces this output:
Use a4:grammar
(or a4:g
as a shortcut) or
a4:rule
(or a4:r
) roles to refer the declared
grammars and rules:
Grammar :a4:g:`MyGrammar` has a root rule :a4:r:`MyGrammar.root`.
The above code produces this output:
Grammar
MyGrammar
has a root ruleMyGrammar.root
.
Use railroad-diagram
, lexer-rule-diagram
and
parser-rule-diagram
directives to render diagrams:
.. parser-rule-diagram:: 'def' ID '(' (arg (',' arg)*)? ')' ':'
The above code produces this output:
Use a4:autogrammar
directive to generate documentation
from a grammar file.
RST reference¶
Declaring objects¶
- .. a4:grammar:: name¶
Declare a new grammar with the given name.
Grammar names should be unique within the project.
Options:
- :name: <str>¶
Specifies a human-readable name for the grammar.
If given, the human-readable name will be rendered instead of the primary grammar name. It will also replace the primary name in all cross references.
For example this code:
.. a4:grammar:: PrimaryName :name: Human-readable name
will render the next grammar description:
- grammar Human-readable name
- :type: mixed|lexer|parser¶
Specifies a grammar type. The type will be displayed in the grammar signature.
For example these three grammars:
.. a4:grammar:: Grammar1 .. a4:grammar:: Grammar2 :type: lexer .. a4:grammar:: Grammar3 :type: parser
will be rendered differently:
- grammar Grammar1
- lexer grammar Grammar2
- parser grammar Grammar3
- :imports: <str>[, <str>[, ...]]¶
Specifies a list of imported grammars.
This option affects name resolution process for rule cross-references. That is, if there is a reference to
grammar.rule
and there is norule
found in thegrammar
, the imported grammars will be searched as well.Note that this setting is not passed through intersphinx.
- :noindex:¶
A standard sphinx option to disable indexing for this rule.
- :diagram-*:¶
One can override any option for all
railroad diagrams
within this grammar. Prefix the desired option withdiagram-
and add to the rule description.For example:
.. a4:grammar:: Test :diagram-end-class: complex All diagrams rendered inside this grammar will have 'end-class' set to 'complex'.
- .. a4:rule:: name¶
Declare a new production rule with the given name.
If placed within an
a4:grammar
body, the rule will be added to that grammar. It can then be referenced by a full path which will include the grammar name and the rule name concatenated with a dot symbol.If placed outside any grammar directive, the rule will be added to an implicitly declared “default” grammar. In this case, the rule’s full path will only include its name.
In either case, the rule name should be unique within its grammar.
Options:
- :name: <str>¶
Specifies a human-readable name for this rule. Refer to the corresponding
a4:grammar
’s option for more info.
- :noindex:¶
A standard sphinx option to disable indexing for this rule.
- :diagram-*:¶
One can override any option for all
railroad diagrams
within this rule. Refer to the correspondinga4:grammar
’s option for more info.
Cross-referencing objects¶
- :any:
All
a4
objects can be cross-referenced via theany
role.If given a full path, e.g.
:any:`grammar_name.rule_name`
,any
will search a rule calledrule_name
in the grammar calledgrammar_name
and then, should this search fail, in all grammars that are imported fromgrammar_name
, recursively.If given a relative path, e.g.
:any:`name`
,any
will perform a global search for a rule or a grammar with the corresponding name.
- :a4:grammar:¶
- :a4:g:¶
Cross-reference a grammar by its name.
There’s nothing special about this role, just specify the grammar name.
- :a4:rule:¶
- :a4:r:¶
Cross-reference a grammar by its name or full path.
If given a full path, e.g.
:a4:r:`grammar_name.rule_name`
, the rule will be first searched in the corresponding grammar, then in all imported grammars, recursively.If given a rule name only, e.g.
:a4:r:`rule_name`
, the behavior depends on context:when used in a grammar declaration body, the rule will be first searched in that grammar, then in any imported grammar, and at last, in the default grammar.
when used without context, the rule will only be searched in the default grammar.
Prepending full path with a tilde works as expected.
Rendering diagrams¶
- .. railroad-diagram::¶
This is the most flexible directive for rendering railroad diagrams. Its content should be a valid YAML document containing the diagram item description.
The diagram item description itself has a recursive definition. It can be one of the next things:
None
(denoted as tilde in YAML) will produce a line without objects:.. railroad-diagram:: ~
a string will produce a terminal node:
.. railroad-diagram:: just some string
a list of diagram item descriptions will produce these items rendered one next to another:
.. railroad-diagram:: - terminal 1 - terminal 2
a dict with
stack
key produces a vertically stacked sequence.The main value (i.e. the one that corresponds to the
stack
key) should contain a list of diagram item descriptions. These items will be rendered vertically:.. railroad-diagram:: stack: - terminal 1 - - terminal 2 - terminal 3
a dict with
choice
key produces an alternative.The main value should contain a list of diagram item descriptions:
.. railroad-diagram:: choice: - terminal 1 - - terminal 2 - terminal 3
a dict with
optional
key will produce an optional item.The main value should contain a single diagram item description.
Additionally, the
skip
key with a boolean value may be added. If equal to true, the element will be rendered off the main line:.. railroad-diagram:: optional: - terminal 1 - optional: - terminal 2 skip: true
a dict with
one_or_more
key will produce a loop.The
one_or_more
element of the dict should contain a single diagram item description.Additionally, the
repeat
key with another diagram item description may be added to insert nodes to the inverse connection of the loop... railroad-diagram:: one_or_more: - terminal 1 - terminal 2 repeat: - terminal 3 - terminal 4
a dict with
zero_or_more
key works likeone_or_more
except that the produced item is optional:.. railroad-diagram:: zero_or_more: - terminal 1 - terminal 2 repeat: - terminal 3 - terminal 4
a dict with
node
key produces a textual node of configurable shape.The main value should contain text which will be rendered in the node.
Optional keys include
href
,css_class
,radius
andpadding
... railroad-diagram:: node: go to google href: https://www.google.com/ css_class: terminal radius: 3 padding: 50
a dict with
terminal
key produces a terminal node.It works exactly like
node
. The only optional key ishref
.a dict with
non_terminal
key produces a non-terminal node.It works exactly like
node
. The only optional key ishref
.a dict with
comment
key produces a comment node.It works exactly like
node
. The only optional key ishref
.
Example:
This example renders a diagram from the features section:
.. railroad-diagram:: - choice: - terminal: 'parser' - - terminal: 'lexer ' default: 1 - terminal: 'grammar' - non_terminal: 'identifier' - terminal: ';'
which translates to:
Customization:
See more on how to customize diagram style in the ‘Customizing diagram style’ section.
Options:
- :padding: <int>, <int>, <int>, <int>¶
Array of four positive integers denoting top, right, bottom and left padding between the diagram and its container. By default, there is 1px of padding on each side.
- :vertical-separation: <int>¶
Vertical space between diagram lines.
- :horizontal-separation: <int>¶
Horizontal space between items within a sequence.
- :arc-radius: <int>¶
Arc radius of railroads. 10px by default.
- :translate-half-pixel:¶
- :no-translate-half-pixel:¶
If enabled, the diagram will be translated half-pixel in both directions. May be used to deal with anti-aliasing issues when using odd stroke widths.
- :internal-alignment: center|left|right|auto-left|auto-right¶
Determines how nodes aligned within a single diagram line. Available options are:
center
– nodes are centered.left
– nodes are flushed to left in all cases.right
– nodes are flushed to right in all cases.auto_left
– nodes in choice groups are flushed left, all other nodes are centered.auto_right
– nodes in choice groups are flushed right, all other nodes are centered.
- :character-advance: <float>¶
Average length of one character in the used font. Since SVG elements cannot expand and shrink dynamically, length of text nodes is calculated as number of symbols multiplied by this constant.
- :end-class: simple|complex¶
Controls how diagram start and end look like. Available options are:
simple
– a simpleT
-shaped ending.complex
– aT
-shaped ending with vertical line doubled.
- :max-width: <int>¶
Max width after which a sequence will be wrapped. This option is used to automatically convert sequences to stacks. Note that this is a suggestive option, there is no guarantee that the diagram will fit to its
max_width
.
- :literal-rendering: name|contents|contents-unquoted¶
Controls how literal rules (i.e. lexer rules that only consist of one string) are rendered. Available options are:
name
– only name of the literal rule is displayed.contents
– quoted literal string is displayed.contents-unquoted
: – literal string is displayed, quotes stripped away.
- :cc-to-dash:¶
- :no-cc-to-dash:¶
If rule have no human-readable name set, convert its name from
CamelCase
todash-case
.
- :alt: <str>¶
If rendering engine does not support output of contents, specified string is used alternatively.
- .. lexer-rule-diagram::¶
The body of this directive should contain a valid Antlr4 lexer rule description.
For example
.. lexer-rule-diagram:: ('+' | '-')? [1-9] [0-9]*
translates to:
Options:
Options are inherited from the
railroad-diagram
directive.
- .. parser-rule-diagram::¶
The body of this directive should contain a valid Antlr4 parser rule description.
For example
.. parser-rule-diagram:: SELECT DISTINCT? ('*' | expression (AS row_name)? (',' expression (AS row_name)?)*)
translates to:
Options:
Options are inherited from the
railroad-diagram
directive.
Autodoc directive¶
- .. a4:autogrammar:: filename¶
Autogrammar directive generates a grammar description from a
.g4
file.Its only argument,
name
, should contain path of the grammar file relative to thea4_base_path
. File extension may be omitted.Autogrammar will read a
.g4
file and extract grammar name (which will be used for cross-referencing), grammar-level documentation comments, set of production rules, their documentation and contents. It will then generate railroad diagrams and render extracted information.See more on how to write documentation comments and control look of the automatically generated railroad diagrams in the ‘Grammar comments and annotations’ section.
Like
autoclass
and other default autodoc directives,autogrammar
can have contents on its own. These contents will be merged with the automatically generated description.Use
docstring-marker
andmembers-marker
to control merging process.Options:
- :name:¶
- :type:¶
- :imports:¶
- :noindex:¶
- :diagram-*:¶
Inherited from
a4:grammar
directive.If not given,
:type:
and:imports:
will be extracted from grammar file.
- :only-reachable-from: <str>¶
If given, autodoc will only render rules that are reachable from this root. This is useful to exclude rules from imported grammars that are not used by the primary grammar.
The value should be either name of a rule from the grammar that’s being documented or a full path which includes grammar name and rule name.
For example, suppose there’s
Lexer.g4
andParser.g4
. To filter lexer rules that are not used by parser grammar, use:.. a4:autogrammar:: Parser :only-reachable-from: Parser.root .. a4:autogrammar:: Lexer :only-reachable-from: Parser.root
- :mark-root-rule:¶
- :no-mark-root-rule:¶
If enabled, automatic diagram for the rule that’s listed in
only-reachable-from
will use complex line endings (see theend-class
option of therailroad-diagram
directive).
- :lexer-rules:¶
- :no-lexer-rules:¶
Controls whether lexer rules should appear in documentation. Enabled by default.
- :parser-rules:¶
- :no-parser-rules:¶
Controls whether parser rules should appear in documentation. Enabled by default.
- :fragments:¶
- :no-fragments:¶
Controls whether fragments should appear in documentation. Disabled by default.
- :undocumented:¶
- :no-undocumented:¶
Controls whether undocumented rules should appear in documentation. Disabled by default.
- :grouping: mixed|lexer-first|parser-first¶
Controls how autodoc groups rules that are extracted from sources.
mixed
– there’s one group that contain all rules.lexer-first
– there are two group: one for parser rules and one for lexer rules and fragments. Lexer group goes first.parser-first
– likelexer-first
, but parser group preceeds lexer group.
- :ordering: by-source|by-name¶
Controls how autodoc orders rules within each group (see
grouping
option).by-source
– rules are ordered as they appear in the grammar file.by-name
– rules are ordered lexicographically.
- .. a4:autorule:: filename rulename¶
Autorule directive renders documentation for a single rule. It accepts two arguments, first is a path to the grammar file relative to the
a4_base_path
, second is name of the rule that should be documented.Note that autorule can only be used when within a grammar definition. Name of the current grammar definition must match name of the grammar from which the documented rule is imported.
Options:
- .. docstring-marker::¶
This marker allows customizing where grammar docstring will be rendered.
By default, grammar docstring (i.e., the comment at the very top of a grammar file) will be added to the end of the autogrammar directive. However, if there is a docstring marker present, grammar docstring will be rendered on its place.
Example:
.. a4:autogrammar:: Json (1) This is the description of the grammar. .. docstring-marker:: (2) This is the continuation of the description.
In this case, the grammar docstring will be rendered between
(1)
and(2)
.
- .. members-marker::¶
This marker allows customizing where rule descriptions will be rendered.
See
docstring-marker
.
Grammar comments and annotations¶
The a4:autogrammar
directive does not parse any comment that’s found
in a grammar file. Instead, it searches for ‘documentation’ comments, i.e. ones
specially formatted. There are three types of such comments:
documentation comments are multiline comments that start with
/**
(that is, a slash followed by double asterisk). These comments should contain valid rst-formatted text.It is common to outline documentation comments by adding an asterisk on each row. Though this is completely optional, a4doc can recognize and handle this pattern.
Example:
/** * This is the grammar root. */ module: moduleItem* EOF
control comments are inline comments that start with
//@
. Control comments contain special commands that affect rendering process.Example:
//@ doc:no-diagram module: moduleItem* EOF
section comments are comments that start with
///
. They’re used to render text between production rules and split grammar definition in sections.Example:
/// **Module definition** /// /// This paragraph describes the ``Module definition`` /// section of the grammar. module: moduleItem* EOF moduleItem: import | symbol
New in version 1.2.0.
There are also restrictions on were documentation and control comments may appear:
documentation comments can be placed either at the beginning of the file, before the
grammar
keyword (in which case they document the whole grammar), or they can be found right before a production rule or a fragment declaration (in which case they are rendered as a rule description). Also, they can be embedded into the rule description, in which case they are rendered as part of the railroad diagram;control comments can only be placed before a production rule declaration. They only affect rendering of that specific production rule;
multiple documentation and control comments can appear before a rule. In this case, the first documentation comment will be rendered before automatically generated railroad diagram, all sequential documentation comments will be rendered after it, and all control comments will be applied before rendering documentation comments;
section comments can only be placed between rules in the main section of a file.
Control comments¶
The list of control comments includes:
//@ doc:nodoc
– exclude this rule fromautogrammar
output.//@ doc:inline
– exclude this rule fromautogrammar
output; any automatically generated railroad diagram that refer this rule will include its contents instead of a single node.Useful for fragments and simple lexer rules.
For example
NUMBER : '-'? ('0' | [1-9] [0-9]*) ('.' [0-9]+)? EXPONENT? ; //@ doc:inline fragment EXPONENT : ('e' | 'E')? ('+' | '-')? [0-9]+ ;
will produce the
number
rule (note how exponent is rendered inside of the number diagram).//@ doc:no-diagram
– do not generate railroad diagram.//@ doc:importance <int>
– controls the ‘importance’ of a rule.By default, all rules have importance of
1
.Rules with importance of
0
will be rendered off the main line in optional groups:In alternative groups, rule with the highest importance will be centered:
//@ doc:unimportant
– set importance to0
.//@ doc:name <str>
– set a human-readable name for this rule. Seea4:rule:name
option.//@ doc:css-class
– add a custom CSS class to all diagrams referencing this rule.New in version 1.5.0.
Configuration¶
Customizing diagram style¶
To customize diagram style, one can replace
the default css file
by placing a a4_railroad_diagram.css
file to the _static
directory.
New in version 1.6.0: to customise how diagrams look in latex build,
place a a4_railroad_diagram_latex.css
file to the _static
directory.
Example output¶
This example was generated from Json.g4.
- grammar Json¶
JSON (JavaScript Object Notation) is a lightweight data-interchange format.
- rule value¶
On top level, JSON consists of a single value. That value can be either a complex structure (such as an
object
or anarray
) or a primitive type (astring
in double quotes, anumber
, ortrue
orfalse
ornull
).
- rule object¶
Object is a collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- rule array¶
Array is an ordered list of values. In most languages, this is realized as vector, list, array or sequence.
- rule number¶
A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
- rule string¶
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.