Plugins API

If you wish to support a new parser generator, you’ll need to implement ModelProvider and call register_provider in your extension’s setup function.

Extension template
import pathlib

import sphinx.application
import sphinx.util.logging

import sphinx_syntax

_logger = sphinx.util.logging.getLogger(__name__)


class MyProvider(sphinx_syntax.ModelProvider):
    # List all file extension your provider can parse.
    supported_extensions = {".yy", ".ll"}

    def __init__(self):
        self._cache: dict[pathlib.Path, sphinx_syntax.Model] = {}

    def from_file(self, path: pathlib.Path) -> sphinx_syntax.Model:
        # Normalize path.
        path = path.expanduser().resolve()

        if cached := self._cache.get(path):
            return cached

        # Grammar name is equal to file name without suffix.
        name = path.stem

        # Check if file exists, report and return empty model if it doesn't.
        if not (path.exists() and path.is_file()):
            _logger.error("unable to load %s: file not found", path)
            model = self._cache[path] = sphinx_syntax.ModelImpl.empty(self, path, name)
            return model

        # Process file and assemble a real model.
        model = self._cache[path] = sphinx_syntax.ModelImpl(self, path, name, ...)
        return model


def setup(app: sphinx.application.Sphinx):
    app.setup_extension("sphinx_syntax")

    sphinx_syntax.register_provider(MyProvider())

    return {
        "version": "1.0.0",
        "parallel_read_safe": True,
        "parallel_write_safe": True,
    }

Basic interfaces

sphinx_syntax.register_provider(provider: ModelProvider)

Register new model provider.

Extensions should call this function during their setup.

class sphinx_syntax.ModelProvider

Base interface for extracting data from a grammar source files.

supported_extensions: set[str]

Set of file extensions that correspond to this provider’s syntax, including leading periods.

can_handle(path: Path) bool

Check whether this provider can parse the given diagram file.

By default, checks file extension against ModelProvider.supported_extensions.

abstractmethod from_file(path: Path, options: LoadingOptions) Model

Load model from file.

If model wasn’t found, or there were errors while loading the model, this method should print errors to a log and return an empty (or a partially loaded) model.

from_name(base_path: Path, name: str, options: LoadingOptions) Model | None

Load model by name.

Used to load root rule if it’s located in a separate grammar. For example, when documenting a lexer, but limiting lexemes to only those used in a parser.

By default, just adds extensions from supported_extensions to name and loads the file if it exists.

class sphinx_syntax.Model

Represents a single parsed grammar.

abstractmethod get_provider() ModelProvider

Return provider that loaded this model.

abstractmethod get_name() str

Get grammar name.

abstractmethod get_path() Path

Get path for the file that this model was loaded from.

abstractmethod get_model_docs() list[tuple[int, str]] | None

Get documentation that appears on top of the model.

The returned list contains one item per documentation comment.

The first element of this item is a line number at which the comment started, the second element is the comment itself.

abstractmethod lookup_local(name: str) RuleBase | None

Lookup symbol with the given name.

Imported models are not checked.

lookup(name: str) RuleBase | None

Lookup symbol with the given name.

Check symbols in the model first, than check imported models. To lookup literal tokens, pass contents of the literal, e.g. model.lookup("'literal'").

Return None if symbol cannot be found.

If there are duplicate symbols, it is unspecified which one is returned.

abstractmethod get_imports() Iterable[Model]

Get all imported models.

No order of iteration is specified.

Note: cyclic imports are allowed in the model.

iter_import_tree() Iterable[Model]

Iterate over this model and all imported models.

No order of iteration is specified.

abstractmethod get_terminals() Iterable[LexerRule]

Get all terminals (including fragments) declared in this model.

Terminals declared in imported models are not included.

No order of iteration is specified.

abstractmethod get_non_terminals() Iterable[ParserRule]

Get all non-terminals (parser rules) declared in this model.

Non-terminals declared in imported models are not included.

No order of iteration is specified.

get_all_rules() Iterable[RuleBase]

Get all rules, both terminals and non-terminals.

No order of iteration is specified.

class sphinx_syntax.ModelImpl(provider: ModelProvider, path: Path, name: str, *, docs: list[tuple[int, str]] | None, imports: Iterable[Model], terminals: Iterable[LexerRule], non_terminals: Iterable[ParserRule])

Default model implementation, simply stores model data.

class sphinx_syntax.LoadingOptions(use_c_char_literals: bool = True)

Additional options for loading a grammar file.

use_c_char_literals: bool = True

Bison-specific setting that indicates whether the target language uses C-lite char literals or single quoted strings.

This option affects parsing of inline code blocks within Bison file.

Production rule descriptions

class sphinx_syntax.RuleBase(name: str, display_name: str | None, model: Model, position: Position, content: RuleContent | None, is_nodoc: bool, is_no_diagram: bool, css_class: str | None, is_inline: bool, keep_diagram_recursive: bool, importance: int, documentation: list[tuple[int, str]] | None, section: Section | None)

Base class for parser and lexer rules.

Note that actual model implementations use LexerRule and ParserRule instead of this base.

name: str

Name of this rule.

display_name: str | None

Display name from doc:name command.

model: Model

Reference to the model in which this rule was declared.

position: Position

A position at which this rule is declared.

content: RuleContent | None

Body of the token or rule definition.

May be omitted for implicitly declared tokens or tokens that were declared in the tokens section of a lexer.

is_nodoc: bool

Indicates that the doc:nodoc flag is set for this rule.

If true, generators should not output any content for this rule.

is_no_diagram: bool

Indicates that the doc:no_diagram flag is set.

If true, generators should not produce syntax diagram for this rule.

css_class: str | None

Custom css class set via the doc:css_class command.

All diagram nodes referencing this rule will have this css class added to them.

is_inline: bool

Indicates that the doc:inline flag is set for this rule.

If true, generators should not output any content for this rule.

They should also inline diagram of this rule when rendering diagrams for any other rule that refers this rule.

keep_diagram_recursive: bool

Indicates that the doc:keep-diagram-recursive flag is set for this rule.

If true, diagram renderer will not attempt converting recursive alternatives to cycles.

importance: int

Importance of the rule, determines its placing in auto-generated diagrams.

documentation: list[tuple[int, str]] | None

Documentation for this rule.

section: Section | None

Which section this rule belongs to?

class sphinx_syntax.ParserRule(name: 'str', display_name: 'str | None', model: 'Model', position: 'Position', content: 'RuleContent | None', is_nodoc: 'bool', is_no_diagram: 'bool', css_class: 'str | None', is_inline: 'bool', keep_diagram_recursive: 'bool', importance: 'int', documentation: 'list[tuple[int, str]] | None', section: 'Section | None')
class sphinx_syntax.LexerRule(name: 'str', display_name: 'str | None', model: 'Model', position: 'Position', content: 'RuleContent | None', is_nodoc: 'bool', is_no_diagram: 'bool', css_class: 'str | None', is_inline: 'bool', keep_diagram_recursive: 'bool', importance: 'int', documentation: 'list[tuple[int, str]] | None', section: 'Section | None', is_literal: 'bool', is_fragment: 'bool')
is_literal: bool

Indicates that this token is a literal token.

Literal tokens are tokens with a single fixed-string literal element.

is_fragment: bool

Indicates that this rule is a fragment.

class sphinx_syntax.Position(file: 'pathlib.Path', line: 'int')
file: Path

Absolute path to the file in which this rule is declared.

line: int

Line at which this rule is declared.

class sphinx_syntax.Section(docs: list[tuple[int, str]], position: Position)

Represents a single section header, i.e. a group of comments that start with a triple slash.

docs: list[tuple[int, str]]

List of documentation lines in the section description.

position: Position

A position at which this section is declared.

Rule AST

class sphinx_syntax.RuleContent

Base class for AST nodes that form lexer and parser rules.

Note that all AST nodes are interned, and can be compared via is keyword instead of ==.

class sphinx_syntax.Reference(model: Model, name: str)

Refers another parser or lexer rule.

model: Model

Reference to the model in which the rule is used.

name: str

Referenced rule name.

get_reference() RuleBase | None

Lookup and return the actual rule class.

Returns None if reference is invalid.

class sphinx_syntax.Doc(value: str)

Inline documentation.

value: str

Documentation content.

class sphinx_syntax.Wildcard

Matches any token.

sphinx_syntax.WILDCARD = Wildcard()

Matches any token.

class sphinx_syntax.Negation(child: RuleContent)

Matches anything but the child rules.

child: RuleContent

Rules that will be negated.

class sphinx_syntax.ZeroPlus(child: RuleContent)

Matches the child zero or more times.

child: RuleContent

Rule which will be parsed zero or more times.

class sphinx_syntax.OnePlus(child: RuleContent)

Matches the child one or more times.

child: RuleContent

Rule which will be parsed one or more times.

class sphinx_syntax.Sequence(children: tuple[RuleContent, ...], linebreaks: tuple[LineBreak, ...] | None = None)

Matches a sequence of elements.

children: tuple[RuleContent, ...]

Children rules that will be parsed in order.

linebreaks: tuple[LineBreak, ...]

Describes where it is preferable to wrap sequence.

sphinx_syntax.EMPTY = Sequence(children=())

Matches a sequence of elements.

class sphinx_syntax.Alternative(children: tuple[RuleContent, ...])

Matches either of children.

children: tuple[RuleContent, ...]

Children rules.

class sphinx_syntax.Literal(content: str)

A sequence of symbols (e.g. 'kwd').

content: str

Formatted content of the literal, with special symbols escaped.

class sphinx_syntax.Range(start: str, end: str)

A range of symbols (e.g. a..b).

start: str

Range first symbol.

end: str

Range last symbol.

class sphinx_syntax.CharSet(content: str)

A character set (e.g. [a-zA-Z]).

content: str

Character set description, square brackets included.

AST visitors

class sphinx_syntax.RuleContentVisitor

Generic visitor for rule contents.

visit(r: RuleContent) T
visit_default(r: RuleContent) T
visit_literal(r: Literal) T
visit_range(r: Range) T
visit_charset(r: CharSet) T
visit_reference(r: Reference) T
visit_doc(r: Doc) T
visit_wildcard(r: Wildcard) T
visit_negation(r: Negation) T
visit_zero_plus(r: ZeroPlus) T
visit_one_plus(r: OnePlus) T
visit_sequence(r: Sequence) T
visit_alternative(r: Alternative) T
class sphinx_syntax.CachedRuleContentVisitor
visit(r: RuleContent) T