headerphoto

APG - An ABNF Parser Generator

Version 6.0 Description
(Released June 17, 2009)

Summary

  • APG 6.0 introduces the option of overriding the normal recursive-descent procedure for any given ABNF rule with a handwritten call back function. This feature can be used to:
    • greatly speed the parsing procedure without breaking the Context-Free Grammar (CFG) formalism
    • handle a few tricky non-CFG requirements in the specification
    • provide a structured, recursive-descent backbone for a fully handwritten parser.

    Examples demonstrate that this feature is capable of resulting in large speed improvement factors with a small number of simple handwritten functions, even in large grammars.


  • An array of new debugging tools is introduced for greater control in finding and removing bugs in
    • the ABNF grammars used to generate the parsers
    • the input strings that are parsed and
    • the call back functions that you write to translate them.

  • A new "Tree Node Model" (TNM) is introduced as an alternative means of traversing the Abstract Syntax Tree (AST). The results of the parse – the phrases matched by the rule names – can be directly accessed by rule name. Iterators provide convenient access to lists of phrases and their children without having to make a full, depth-first pass through the AST.

  • New and more user-friendly rules for defining the syntax and semantic call back functions and the AST nodes are given. The new rules are:
    • more fine-grained, providing better control
    • defined at run time - they have been removed from the grammar specification
    • less prone to call back functions code/data loss in the iterative procedure of debugging

  • Two very simple examples have been added which demonstrate the use of all APG 6.0 features.
    • A "Minimal" sample shows how an input string (or language sentence) recognizer can be built from a simple driver program and the unaltered output of the parser generator.
    • The "Demo" sample illustrates the use of all APG 6.0 features using a simple and well-known expressions grammar.

[top]

1.0 Introduction

APG – an ABNF Parser Generator – generates recursive-descent parsers with backtracking from ABNF [1] grammars. It is assumed the reader is familiar with these concepts - refs. [2], [3], [4] and [5] or others. A detailed description of the fundamentals will not be attempted here. However, to understand the operation of APG and the reasons and methods in the sections below it is important to understand the ABNF versions of syntax trees, their nodes, their differences and their uses.

[top]

1.1 The Syntax Tree

An ABNF named rule is an alternation of concatenations of repetitions of elements. (See section 4. of ref. [1] or the Parser grammar in Appendix C.) An element can be another alternation, another named rule or a terminal. Each of these entities is represented as a node in the syntax tree. Their descriptions follow, related here to the analogous concepts defined in ref. [2].

ALT – an alternation (interior or-node, ref. [2], pg. 174)
CAT – a concatenation (interior cat-node, Ibid.)
REP – a repetition (interior star-node, Ibid.)
RNM – a rule name (non-terminal node, ref. [2], pg.197)
TRG – a range (a special* terminal node, Ibid.)
TLS – a literal string(a special* terminal node)
TBS – a binary string (a special* terminal node)
PRD – syntactic predicate**

The list of concatenations forming an alternation and the list of repetitions forming a concatenation are often represented graphically as branches from a tree node. Figure 1 illustrates the syntax tree for the simple grammar:

currency = dollars / cents
dollars = "$" amount
cents = amount "¢"
amount = 1*digit
digit = "0" / "1" / "2" / …/ "9"

The syntax tree is a representation of all possible alternatives for constructing a language sentence from the grammar alphabet symbols. The "descent" of recursive-descent refers to the process of descending into the syntax tree in a depth-first manner. "Recursive" refers to the process of recursively descending into the syntax tree of the named rule at a rule name node. Recursive-descent parsing is the process of matching language sentence alphabet symbols to the terminal nodes in a left-to-right fashion until the entire sentence is matched, or until the match fails.

(*) Theoretically, the only terminal nodes are the individual language alphabet symbols. ABNF defines ranges of symbols, strings of symbols and strings of case-insensitive ASCII character symbols. APG uses specially written functions for these ABNF-defined entities and treats them as terminal nodes, even though they are actually small grammars over the theoretical terminal symbols. See Figure 2 for the syntax trees of the grammars defining the APG terminals TRG, TLS and TBS.

(**) See APG 5.0, Section 3.2.3.5, for a discussion of Syntactic Predicates. They will not be further discussed here.

[top]

1.2 The Parse Tree

The recursive-descent parsing process described above has the effect of selecting those branches of the syntax tree which extend from the root to the matched terminal nodes and discarding all others. The surviving branches form the parse tree. Figure 3 illustrates the parse tree selection process for the example grammar given above. (Note that multiple phrase repetitions appear in the parse tree as children of the REP operator.)

Ambiguity - the possibility of discovering more than one parse tree matching a given language sentence - is removed from APG through application of the "prioritized-choice" rule (see ref. [5]). At each ALT node, the alternative branches are tried in a left-to-right fashion until a match is found. The remaining alternative branches are discarded whether they form a match or not.

One feature of the parse tree is that the ALT nodes obviously become superfluous. Only one branch of each survives. Another is that the phrases matching the terminal nodes all concatenate in a left-to-right manner to reconstruct the language sentence. But most importantly, a phrase can now also be attributed to each of the interior nodes. In particular, it is phrases matching the rule name nodes (see Figure 3) that most applications are interested in.

[top]

1.3 The Abstract Syntax Tree (AST)

In fact, it is usually only the phrases matching a few specific rule name nodes that are of semantic interest to the application. The complete construction of a grammar for a specific application often, probably almost always, requires the use of many phrases which are of no semantic interest. For example, in the currency example above, the amount is the phrase of primary interest. Digit is a grammatical convenience for the full description of currency, but is of little or no semantic interest.

For the application's purposes, then, many of the nodes of the parse tree can be ignored. The AST is the remaining tree after the uninteresting nodes of the parse tree are removed. i.e. all interior and terminal nodes and all rule name nodes for phrases of no semantic interest. Figure 4. shows an AST that might be constructed for the running example.

In practice, APG (optionally) constructs the AST on the fly during the parsing process and the complete parse tree itself is never constructed. (However, see the discussion of tracing in section 4 below.)

For example, using the running example, an application would probably be interested in knowing only the amount and whether it was dollars or cents. Therefore, only those two AST nodes would be of interest. Specifying only "dollars", "cents" and "amount" would result in the AST shown in Figure 4

[top]

1.4 Call back functions - how and when they are used.

The most important feature of APG is the call back functions which allow the user to interact with the parsing process. There are two types of call back functions. The syntax call back functions are called during the traversal of the syntax tree during the parsing process. The semantic call back functions are called during the traversal of the AST during the translation process. In both cases, for each rule name node you can optionally define a call back function which the parser will call twice – once pre-branch, before it descends into the tree branch below and once post-branch after the tree branch traversal is complete. In previous versions of APG, these two functions were the same, an input state variable letting you know which type of call back was being made. A new features of APG 6.0 is the separation of these two into separate functions.

[top]

1.4.1 Syntax call back functions.

The syntax call back functions can be used in three ways.

  • During the pre-branch call, the input string (or language sentence) can be scanned for clues to a match. If no match is possible the descent into the branch below can be aborted, saving the parser the time and energy of a futile trip. The saving doing this has been previously illustrated. (See, for example, test 3, the look-ahead test in the IPv6 example of APG 5.0.)

  • The post-branch call can be used to examine the context of a matched phrase and a successful match can be rejected on semantic grounds.

  • A major new feature of APG 6.0 has been to expand the syntax call back usage to allow the user to write a pre-branch function here which completely replaces the recursive-descent process for the branch below. This is illustrated in the SIP example of this release, APG 6.0. The possible advantages and disadvantages of doing this are expanded on below.

[top]

1.4.2 Semantic call back functions.

Semantic call back functions are called pre-branch and post-branch during a depth-first traversal of the AST for the syntax-directed translation of the language sentence. These are equivalent to preorder and postorder semantic actions of ref. [2].

[top]

2.0 A clean separation of parsing options

In previous versions, APG generated empty call back functions as a part of the generated parser. In order to accomplish this it was necessary to indicate, prior to generation, which rule names were of interest. This scheme is unsatisfactory in a number of ways.

  • It clutters the statement of the grammar with rule name selections.
  • Changes in the rule selection require re-generation.
  • Re-generations, normal to debugging and maintenance of the grammar, are hindered by the attempt, not always successful, to save the user's handwritten code.

APG 6.0 makes an attempt to relieve some of this clumsiness.

[top]

2.1 Rule Name Selection.

The rule name selection for syntax call back functions, semantic call back functions and AST node generation are separated into three independent processes, all of which are done at run time, not parser generation time. Three new, run-time functions have been added to accomplish this.

  • ulParserSyntaxInit()
  • ulParserSemanticInit()
  • ulParserAstInit()
Any, all or none of these functions may be called prior to parsing.

[top]

2.2 Call Back Functions.

Neither syntax nor semantic call back functions are generated automatically. The files generated by the parser generator should never be modified. Any modifications to them will be overwritten upon re-generation. Unlike previous versions, no user-written code is preserved. The user is required to write the call back functions, preferably in a separate file. However, the generator does relieve the user of much of the work by creating a skeletal outline of both the rule name selection and the call back functions in comment form that the user can cut and paste as a time-saving start for the call back function files.

[top]

3.0 A new "handwritten parser" option.

One of the most often cited criticisms of handwritten parsers is that when handwriting one very quickly loses track of exactly what language it is that the parser accepts. One of the most often cited reasons in favor of handwriting, although this is by no means universal, is that handwritten parsers are often faster. The syntax call back functions have been modified in APG 6.0 in such a way that specific rule names can be targeted for handwriting while maintaining the recursive-descent, parser generated code for the remainder of the grammar. This can be used in a continuum of ways.

  • Handwritten improvements can easily be written for simple rules without deviating from the original language defined by the context-free grammar (CFG).
  • Handwriting just a few non-CFG rules can be done to match a specification which has an almost-but-not-quite CFG statement. This is particularly convenient for those cases where the "prioritized-choice" rule prevents parsing an otherwise valid CFG phrase.
  • You can throw caution to the wind and develop a completely handwritten parser with all of its pros and cons.

[top]

3.1 Speed

The prime motivation for introducing the handwritten option was to speed the parser. Consider the following:

  • A handwritten parser for the grammar anbn is 50-60 times faster than the generated, recursive-descent parser for the same grammar.
  • Many large grammars, text-based internet standards for example, have large numbers of small rules that, like anbn, are simple to write by hand without deviating from the CFG that defines them.

The SIP example in the Samples folder shows the success of this approach.

[top]

3.2 Difficult rules.

Another advantage of targeting specific rule names for handwritten parsers is that it allows the user to handwrite a rule that performs a function that is difficult or even not possible with a CFG, generated parser.

[top]

3.2.1 Avoiding grammar bloat.

Consider the grammar for an IPv4 address.

IPv4Address = INT255 3("." INT255)
INT255 = 1*3digit
digit = %d48-57


The address is simply four integers separated by periods. However, we must also impose restrictions on the integers – the value must be less than 256 and leading zeros, not to exceed three digits total, must be allowed. This can be done, but the grammar becomes lengthy and the parser is burdened.

INT255 = "2" "5" ("0"/"1"/"2"/"3"/"4"/"5") /
         "2" ("0"/"1"/"2"/"3"/"4") DIGIT /
         "1" 2DIGIT /
         ["0"] POS-DIGIT DIGIT /
         ["0" ["0"]] DIGIT /
         "0" ["0" ["0"]]


A handwritten parser can be done with only a few lines of code without deviating from the CFG specification.

[top]

3.2.2 Handling the "prioritized-choice" disambiguation rule.

Consider the "hostname" rule in the SIP-URI definition in RFC 3261. The discussion in the SIP example below demonstrates a how "hostname", even though it is a valid CFG rule, cannot be parsed with the "prioritized-choice" rule. A simple handwritten rule for "hostname" solves the problem easily and quickly, without syntactic predicates or other bulky grammar additions.

[top]

3.2.3 Handling non-context-free grammar requirements.

One of the most difficult parsing problems posed by the C and C++ languages is distinguishing type names from variables, both of which are identically defined as "identifiers". This is typically resolved with what is often called the lexer hack. For scannerless parsers, like APG, this problem is not difficult but the handwritten call back functions provide a particularly simple solution. A handwritten call back function for the "typedef" statement will store the name in a table of types. A handwritten function for "identifier" will look at the table. If the identifier is in the table it is a type, otherwise it is a variable.

[top]

3.3 Recursive-descent backbone for completely handwritten parser.

If, on the other hand, you want to throw caution to the wind and write a completely handwritten parser, APG 6.0 can serve as an organizing tool, providing a generated back bone to work with.

[top]

4.0 New, finer-grained details in parser tracing.

In my experience, debugging the ABNF grammar syntax is not too difficult. APG provides sufficiently detailed error messages that it usually only takes a few iterations to get the grammar syntactically correct. Unfortunately, this does not mean that the grammar really does what you intend it to. There are much more difficult problems ahead. At this point you have an untested parser and you usually have untested input strings (language sentences). The ones you write yourself are almost always buggy and the ones you get from third parties are often buggy too.

  • Problem number one: you feed a new input string into a new parser and the parse fails. Was it the parser or the string or both that failed? Where did it/they fail?
  • Problem number two: the parse succeeds but when you examine the matched rule name phrases some of those expected fail to show up and/or unexpected rule names appear. What went wrong and where?

A trace of the parsing process through the syntax tree is obviously what is needed.

  • Problem number three: often hundreds of thousands, even millions of syntax tree nodes are visited during the parsing process.

Clearly, fine-grained control over which and how many of the syntax tree nodes are displayed is called for. APG 5.x and lower did provide a tracing facility, but only a complete print out of the entire trace was available. APG 6.0 gives the user much better tracing tools. But like any other application, the more features you have the steeper the learning curve. Controlling the trace output is not complicated but does have a few features that need to be understood.

[top]

4.1 Parser Tracing.

Tracing is a run-time option, however the option is only available if the APG library has been compiled with the tracing flag set. This is always the case in the debug version. If, for some reason, you need to trace the release version you will need to do a special compilation with the _APG_CFG_TRACE flag set. There are no hard and fast rules for debugging a parse. It is an art of instinct, but these are the tools that I have found to be most helpful. There are two Parser component calls that control tracing.

[top]

4.1.1 ulParserTraceInit(ctx, flags, begin, delta, end)

  • ctx – a handle to a valid parser component.
  • flags – the flags and their meanings are defined in Apg.h.
  • begin – only lines begin <= line# are traced.
  • delta – if delta > 0 then only lines, begin <= line# <= (begin + delta) are traced.
  • end – if delta = 0 and end > begin, then only lines begin <= line# <= end are traced.

A call to this function is required for any tracing to occur. When a parse fails, the most interesting part of the trace is usually the last few lines where the failure probably occurred. Therefore, often only the last 1000 or so lines are of interest. But because the trace is printed in real time as the nodes are visited during the parsing process, it is impossible to know in advance what the line numbers of the last lines are. To help with this is a flag, APG_TRACE_COUNT_ONLY. With this flag ORed to the others, no lines are printed but the final count is given. Therefore, you can run a two step procedure. Once with counting only, and once specifying only the last lines of interest.

[top]

4.1.2 ulParserTraceSelectRule(ctx, ruleID)

  • ctx – a handle to a valid parser component.
  • ruleID – the identifier of the rule to trace.

If you know the specific rule that is failing, or if you are trying to track down why the parsed phrases are showing up in the wrong rules, this call can be used to limit the trace to only the specified rules. Multiple calls to this function can be made, one for each rule to trace. The rule identifiers can be found in the *.h file generated by APG.

[top]

4.1.3 Tracing statistics.

In addition to printing selected tracing lines, special statistics of the trace are available. These statistics include the total number of lines printed. Using the APG_TRACE_COUNT_ONLY flag, no trace lines are actually printed but the tracing statistics will report the number of lines printed as if they had been. This is very useful when you want to print only the last few lines of a long print out but don't know in advance what the last few line numbers are.

Otherwise, the tracing statistics are much the same as the regular statistics except that they are organized in a somewhat more convenient format for tracing. I've found them convenient for optimizing a parser with the use of handwritten syntax call back functions. The statistics show which rule name nodes have been most often visited, indicating which rules to handwrite. Plus, comparing the total number of node visits before and after the introduction of a specific handwritten function gives you a very good indication of how much speed improvement you have achieved.

[top]

5.0 The new "Tree Node Model" (TNM).

APG 5.0 introduced a new "catalog" component (section 6.5.3) which provided direct access to specific nodes of the AST and iterators over lists of nodes. The name "catalog" was perhaps something of a misnomer and the component's usage was awkward. That component has been completely re-written and renamed. It works somewhat like the Document Object Model of HTML and hence the name "Tree Node Model". The API and iterators have been implemented in a way to make its use, hopefully, more convenient. I have found it to be, at least.

An example of TNM usage can be found in the Demo example in the Samples folder.

[top]

6.0 Examples.

In the Samples folder of the APG 6.0 distribution are six usage examples. The first three - "Minimal", "Demo" and "CPlusPlus" are meant to give relatively complete coverage of all of the features of APG 6.0. The remaining three demonstrate APG 6.0 usage for Internet protocols, grammars that are both of significant current interest and of significant grammar size.

[top]

6.1 Minimal - a minimal usage example.

The parser generator generates two files - *.h and *.c or *.cpp if a C++ parser is requested. These two files, without modification, define a complete language recognizer. That is, they will take an input string and give a yes/no answer as to whether it is a valid language sentence or not.

This example demonstrates language recognition with a simple main function to drive the recognizer for a the mimimal expression grammar given on pg. 193 of the Dragon Book (ref. [2]).

[top]

6.2 Demo - a demonstration of all parser features.

APG 6.0 has an enhanced user interface for more fine-grained control over the call back functions, the AST, the tracing facility and the new TNM access to the AST. This example, using the same expression grammar as the previous example, gives complete coverage of all of the new and old features.

[top]

6.3 CPlusPlus - a demonstration of C++ a language parser.

This example, also given in APG 5.0, is an update of the trivial parser demonstrating the usage of the C++ version of the generated parser.

[top]

6.4 URI - the Universal Resource Indicator.

The Universal Resource Indicator (URI, RFC 3986) is of fundamental importance in almost all Internet protocols dealing with the World Wide Web. This example parses a variety of URIs displaying many of its structures using the TNM component to access the AST nodes. The parsed phrases are displayed in an informative format which shows the general structure of a URI.

[top]

6.5 SDP - the Session Description Protocol.

The Session Description Protocol (SDP, RFC 4566) is used by many different Internet protocols dealing with media sessions. This example does a simple parse of several valid and several invalid SDP announcements validating the grammar.

[top]

6.6 SIP - the Session Initiation Protocol

The Session Initiation Protocol (SIP, RFC 3261) is a protocol with a significant amount of commercial application. It's grammar has some challenging rule definitions and, with over 300 rule name definitions, it is a grammar of significant size. This grammar has been chosen to test the idea that handwriting a few simple rules can result in a significant increase in the parsing speed. Indeed, hand writing only three dozen of the simplest phrases results in a speed increase of 7-8 times the normal parsing speed.

One rule in particular, "hostname", is worth examining in detail. It presents two challenges to the prioritized-choice rule favored here. There are a couple of parsing problems without this rule. The first is that ambiguities present the problem of algorithmically trying to decide which of the several possible parse trees (often referred to as the "forest" of parse trees) to choose from. Since the choice usually requires some semantic information, removing the ambiguity is viewed, here at least, as desirable. Second, and possibly more importantly, if parsing speed is important to your application, handling all of the possible alternatives can slow things down.

The original grammar for "hostname" is given as:

hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum / alphanum *( alphanum / "-" ) alphanum
toplabel = ALPHA / ALPHA *( alphanum / "-" ) alphanum

"domainlabel" must begin and end with alphanum, but may contain hyphens in the body of the label. "toplabel", in addition, must begin alphabetically. The problem here is typical of the prioritized choice rule. *(alphanum /"-") will consume the entire string and the final concatenation of "alphanum" will fail. This is a perfectly valid CFG rule. Without prioritized choice the parse will succeed, however with a lot more parsing effort. But some fix up is needed to parse it with the prioritized-choice rule. With a little examination we can see that the following grammar re-write solves the problem:

domainlabel = 1*alphanum *(1*"-" 1*alphanum)
toplabel = ALPHA *alphanum *(1*"-" 1*alphanum)

The "hostname" rule, on the other hand, presents a tougher challenge. Because of the optional trailing period, it becomes, in general, impossible to distinguish between "toplabel" and "domainlabel". For example, consider the string:

domain.com.

The phrase "com." matches both (domainlabel ".") and toplabel ["."]. To the best of my knowledge, there is no prioritized choice grammar which will recognize "com" as a toplabel rule phrase. We can, however, solve the problem using syntactic predicates. Consider:

hostname = *(domainlabel "." &(alphanum/"-")) toplabel ["."]

The syntactic predicate &(alphanum/"-") forces domanlabel "." to be followed by at least one more "domainlabel" or the "toplabel". Otherwise it must be a "toplabel".

However, the syntactic predicate adds additional parsing overhead to a parser that we are trying to optimize for speed. A handwritten rule for "hostname", as demonstrated in this example, solves the problem simply and efficiently.

[top]

Appendix A: References

1.  RFC 4234, The Internet Engineering Task Force (IETF) 
    Requests for Comment:4234, 
    See the IETF RFC page, http://www.ietf.org/rfc.html
    
2.  Alfred V. Aho, Ravi Sethi, Monica S. Lam and Jeffrey D. Ullman, 
    "Compilers: Principles, Techniques, and Tools",
    2nd Edition, Addison-Wesley, 2007
    
3.  John E. Hopcroft and Jeffrey D. Ullman, 
    "Introduction to Automata Theory, Languages and Computation", 
    Addison-Wesley, 1979
    
4.  R Gregory Taylor, 
    "Models of Computation and Formal Languages", 
    Oxford University Press, 1998
    
5.  Bryan Ford, 
    "Parsing Expression Grammars: 
    A Recognition-Based Syntactic Foundation", 
    Proceedings 31st ACM Symposium on 
    Principles of Programming Languages, 
    pp. 111-122, 2004

[top]

Appendix B: APG 6.0 command line options.

help options
? - print this help screen
/? - print this help screen
/h - print this help screen

file name options
/in:filename - input grammar file name (option required)
/out:filebase - generate parser files filebase.h and filebase.c
/log:filename - log output to filename (default is console)

integer value options
/sestmax:int - int is maximum number of SEST terminal nodes to allow (default = 1,000,000)
/startrule:int - int is start rule index (default = 0)

true/false options
/nr - no recursion analysis - default is to do recursion analysis
/cpp - output a C++ parser component, filebase.cpp
/dv - verbose, implies display of all (v) options
/dgrammar - (v) display input grammar file with one-based line numbers
/dwarnings - (v) display grammar syntax warnings (implies /dgrammar)
/derrors - (v) display grammar syntax errors (implies /dwarnings)
/dmetrics - (v) display grammar metrics
/dstate - (v) display parser both lexical and token parser states
/dr - display recursion analysis
/drv - verbose recursion analysis
/dtokens - (*) display grammar tokens
/dopcodes - (*) display grammar opcodes

options must be lower case only
option file names, if any, are case sensitive
file name lengths must be < 128 (including null terminators)
default value of all true/false options are false
(*) caveat: may generate extensive output

[top]

Appendix C: ABNF grammar.

The bootstrap for APG 6.0 was originally written as an APG 5.1 application. Since then many iterations of the bootstrap have evolved through previous and incomplete versions of APG 6.0. Currently, APG 6.0 serves as its own bootstrap. The ABNF grammar is split into a lexer and a parser, simply because the translation logic was much easier using tokens. The two grammars used are given here.

Note that the "incremental alternatives" operator "=/" is not implemented. The grammar will recognize it, but the translator will issue an error message and the translation will fail.

Lexer Grammar

 
; Pre-processing grammar - tokenize the RFC 4234 grammar file
; - symbol alphabet is ASCII characters 9, 10, 13 & 32-127
;
; Deviations from RFC 4234
; 1) grammar file must be null terminated (last character must be 0 (zero))
; 2) Lines may end in CR, LF, CRLF or null if last line
; 3) Rule definitions must be flush left
;    - start in first character of new line
; 4) Horizontal tabs are assigned a tab value of one
;    - they are converted to a single space
; 5) A line continuation is treated as a single space
;    - it may form a concatenation operator
;
; Prior to parsing the ASCII grammar file must have the following preparation
; 1) All characters verified as valid grammar characters.
;    - only CR(%d13), LF(%d10), tab(%d9) and printing characters (%d32-127)
; 2) All horizontal tabs (%d9) converted to spaces (%d32).
; 4) The line ends CR, LF or CRLF are converted to LF.
;
; NOTE: using the prioritized-choice disambiguation rule
;       - the ordering of the alternatives is very important

File          = *Line
Line          = Blank /
                Rule  /
                LineError
Blank         = owsp BlankLineEnd
LineError     = *(%d32-127) LineEnd

Rule          = (NameDef/NameDefError) owsp (DefAlt/Def/DefError) owsp *Token owsp LineEnd
NameDef       = alphanum ; -> TOK-NAMEDEF
DefAlt        = %d61.47                       ; "="           ; -> TOK-DEF
Def           = %d61                          ; "=/"          ; -> TOK-DEFALT
NameDefError  = 1*%d33-127                    ; any but space ; -> TOK-ERROR
DefError      = 1*%d33-127                    ; any but space ; -> TOK-ERROR
OpAlt         = owsp %d47 owsp                ; "/"           ; -> TOK-ALT
OpNot         = %d33                          ; "!"           ; -> TOK-NOT
OpAnd         = %d38                          ; "&"           ; -> TOK-AND
OpRparen      = owsp %d41                           ; ")"     ; -> TOK-RPAREN
OpRoption     = owsp %d93                           ; "]"     ; -> TOK-ROPTION
OpCat         = wsp owsp                         ; concatenation ; -> TOK-CAT
OpLparen      = %d40 owsp                           ; "("     ; -> TOK-LPAAREN
OpLoption     = %d91 owsp                           ; "["     ; -> TOK-LOPTION
OpRep         = %d42                          ; "*"           ; -> TOK-REP
OpNumber      = dnum                                     ; -> TOK-NUM
dRange        = OpDec dmin %d45 dmax          ; %d0-9         ; -> TOK-TRG
dString       = OpDec dnum 1*(%d46 dnum)      ; %d0.9         ; -> TOK-TBS
dChar         = OpDec dmin                    ; %d0-9         ; -> TOK-TRG
xRange        = OpHex xmin %d45 xmax          ; %x0-F         ; -> TOK-TRG
xString       = OpHex xnum 1*(%d46 xnum)      ; %x0.F         ; -> TOK-TBS
xChar         = OpHex xmin                    ; %x0-F         ; -> TOK-TRG
bRange        = OpBin bmin %d45 bmax          ; %b0-1         ; -> TOK-TRG
bString       = OpBin bnum 1*(%d46 bnum)      ; %b0.1         ; -> TOK-TBS
bChar         = OpBin bmin                    ; %b0-1         ; -> TOK-TRG
Name          = alphanum                   ; -> TOK-NAME
OpTls         = %d34 *(%d32-33/%d35-127) %d34       ; "text"  ; -> TOK-TLS
ProsVal       = %d60 *(%d32-61/%d63-127) %d62       ;   ; -> TOK-ERROR
TokenError    = 1*%d33-127                    ; any but space ; -> TOK-ERROR

Token         = OpAlt     /
                OpAnd     /
                OpNot     /
                OpRparen  /
                OpRoption /
                OpCat     /
                OpLparen  /
                OpLoption /
                OpRep     /
                OpNumber  /
                dRange    /
                dString   /
                dChar     /
                xRange    /
                xString   /
                xChar     /
                bRange    /
                bString   /
                bChar     /
                OpTls     /
                Name      /
                ProsVal   /
                TokenError

OpDec         = %d37 (%d68/%d100)             ; %d
OpHex         = %d37 (%d88/%d120)             ; %h
OpBin         = %d37 (%d66/%d98)              ; %b

dmin          = dnum
dmax          = dnum
dnum          = 1*(%d48-57)

bmin          = bnum
bmax          = bnum
bnum          = 1*%d48-49

xmin          = xnum
xmax          = xnum
xnum          = 1*(%d48-57 / %d65-70 / %d97-102)

;alpha         = %d97-122/%d65-90 ; A-Z or a-z
alphanum = (%d97-122/%d65-90) *(%d97-122/%d65-90/%d48-57/%d45)
owsp          = *wsp
wsp           = %d32    /
                comment /
                LineContinue
comment       = %d59 *(%d32-127)
LineEnd       = %d10                  ; -> TOK-LINEEND
BlankLineEnd  = %d10                  ; separately named so as not to push a TOK-LINEEND
LineContinue  = %d10.32

Parser Grammar

 
; 
; ABNF for SABNF - generate v6.0 APG opcodes
; symbol alphabet is SABNF tokens generated by lexer (pre-parser)

File          = *(Rule/RuleError)
Rule          = TOK-NAMEDEF (TOK-DEF/TOK-DEFALT) Alternation TOK-LINEEND
RuleError     = *(%d1-20) TOK-LINEEND
Alternation   = Concatenation *(TOK-ALT Concatenation)
Concatenation = Repetition *(TOK-CAT Repetition)
Repetition    = Rep       /
                Repeat    /
                Predicate	/
                NoRep
Rep           = dmin [max-reps] Element /
                TOK-REP dmax Element
Repeat        = TOK-REP Element
NoRep         = Element
Predicate     = (TOK-AND / TOK-NOT) Element
dmin          = TOK-NUM
dmax          = TOK-NUM
max-reps      = TOK-REP [dmax]
Element       = TOK-TRG /
                TOK-TLS /
                TOK-TBS /
                TOK-RNM /
                Group   /
                Option
Group         = TOK-LPAREN  Alternation TOK-RPAREN
Option        = TOK-LOPTION Alternation TOK-ROPTION

TOK-LINEEND = %d0
TOK-NUM     = %d1
TOK-NAMEDEF = %d2
TOK-DEF     = %d3
TOK-DEFALT  = %d4
TOK-LPAREN  = %d5
TOK-RPAREN  = %d6
TOK-LOPTION = %d7
TOK-ROPTION = %d8
;TOK-MIN     = %d9
;TOK-MAX     = %d10
TOK-RNM     = %d11
TOK-ALT     = %d12
TOK-CAT     = %d13
TOK-REP     = %d14
TOK-AND     = %d15
TOK-NOT     = %d16
TOK-TRG     = %d17
TOK-TBS     = %d18
TOK-TLS     = %d19

[top]