headerphoto

ABNF Parser Generator

UTF-8

UNICODE, the universal character set encompassing most of the world's writing systems, requires a 32-bit character for its full discription. Many current applications and protocols, however, assume 8- or even 7-bit characters. UTF-8[RFC 3629]is a UNICODE encoding form which uses only 8-bit characters and as such is ubiquitous within the many higher-level Internet protocols. This is an example of a grammar that can be used to parse the UTF-8 encoding form.

ABNF Grammar:   X

UTF-8 Input

A string of characters covering 1-, 2-, 3- and 4-byte character codes. This is also the first example to use the "hex" Input Mode for the input string.

Input String:   X

Parser Output: