package require Tcl 8.4
package require grammar::mengine ?0.1?
package require grammar::peg::interp ?0.1.1?
::grammar::peg::interp::parse nextcmd errorvar astvar
This package provides commands for the controlled matching of a character stream via a parsing expression grammar and the creation of an abstract syntax tree for the stream and partials.
It is built on top of the virtual machine provided by the package grammar::me::tcl and directly interprets the parsing expression grammar given to it. In other words, the grammar is not pre-compiled but used as is.
The grammar to be interpreted is taken from a container object following the interface specified by the package grammar::peg::container. Only the relevant parts are copied into the state of this package.
It should be noted that the package provides exactly one instance of the interpreter, and interpreting a second grammar requires the user to either abort or complete a running interpretation, or to put them into different Tcl interpreters.
Also of note is that the implementation assumes a pull-type handling of the input. In other words, the interpreter pulls characters from the input stream as it needs them. For usage in a push environment, i.e. where the environment pushes new characters as they come we have to put the engine into its own thread.
The Interpreter API
The package exports the following API
- ::grammar::peg::interp::setup peg
This command (re)initializes the interpreter. It returns the empty string. This command has to be invoked first, before any matching run.
Its argument peg is the handle of an object containing the parsing expression grammar to interpret. This grammar has to be valid, or an error will be thrown.
- ::grammar::peg::interp::parse nextcmd errorvar astvar
This command interprets the loaded grammar and tries to match it against the stream of characters represented by the command prefix nextcmd.
The command prefix nextcmd represents the input stream of characters and is invoked by the interpreter whenever the a new character from the stream is required. The callback has to return either the empty list, or a list of 4 elements containing the token, its lexeme attribute, and its location as line number and column index, in this order. The empty list is the signal that the end of the input stream has been reached. The lexeme attribute is stored in the terminal cache, but otherwise not used by the machine.
The result of the command is a boolean value indicating whether the matching process was successful (true), or not (false). In the case of a match failure error information will be stored into the variable referenced by errorvar. The variable referenced by astvar will always contain the generated abstract syntax tree, however in the case of an error it will be only partial and possibly malformed.
The abstract syntax tree is represented by a nested list, as described in section AST VALUES of document grammar::me_ast.
Bugs, Ideas, Feedback
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category grammar_peg of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist]. Please also report any ideas for enhancements you may have for either package and/or documentation.
When proposing code changes, please provide unified diffs, i.e the output of diff -u.
Note further that attachments are strongly preferred over inlined patches. Attachments can be made by going to the Edit form of the ticket immediately after its creation, and then using the left-most button in the secondary navigation bar.
LL(k), TDPL, context-free languages, expression, grammar, matching, parsing, parsing expression, parsing expression grammar, push down automaton, recursive descent, state, top-down parsing languages, transducer, virtual machine
Grammars and finite automata
Copyright (c) 2005-2011 Andreas Kupries <firstname.lastname@example.org>