Home - Topics - Papers - Theses - Blog - CV - Photos - Funny

eXtensible Syntax Object Notation (XSON)

Goals:

The XSON header

An XSON stream starts with xson[imports], where imports is a comma-separated list of syntax imports. These can be simple identifiers for extensions defined in this document, or may (how?) refer to external syntax extensions by URI/CRI.

Decoding XSON

If multiple syntax extensions “claim” the same shorthand notation and neither of them builds on the other to provide a disambiguation rule, then the text is considered a syntax error as if it had not matched any extension.

Encoding XSON

Syntax extensions may have and attempt to encode using shorthand forms, which are convenient for concise expressiveness. Extensions cannot know if their shorthand will be ambiguous with the shorthand syntax of other extensions that may be in use, however,

Therefore, whenever an extension produces an untagged shorthand form, the XSON encoder framework immediately re-decodes the encoded text in order to check for syntactic collisions. If no syntactic collision occurs, the framework uses the shorthand. If a collision does occur, the element is re-encoded in syntax-tagged form, which is guaranteed to be unambiguously decodable.

minXSON

Only two primitives providing minimal structure: strings and arrays.

JSON actually corresponds to a particular XSON format with a certain set of well-defined extensions on top of minXSON: namely null, bool, number, and object.

By design, any data conforming to any XSON format can be converted to or from JSON or minXSON.

Perhaps minXSON is so minimal that nothing is representable?

An XSON stream is a CTS text string containing hierarchical tree of elements. Each element has a general syntax-tagged form of tag[content]. The tag names the syntactic extension invoked by this element, while the content has a syntax entirely defined by that extension, aside from obeying the global CTS rule that “matchers must match.”

JSON-Equivalent Syntax Extensions

Null

Boolean

The full syntax of a boolean value is either bool[true] or bool[false], which may be abbreviated to simply true or false when there is no potential ambiguity that requires syntax tagging.

Natural

Integer

Number

String

Supports the standard JSON-defined character escapes \", \\, \/, \b backspace, \f form feed, \n line feed, \r carriage return, \t tab.

Unicode character escapes have the form \u[hex] where hex is the Unicode code point in hexadecimal. Learning from Swift, this notation avoids JSON’s need to use surrogate pairs to express code points above 0xffff, and the need to have two separate forms like the 4-digit \u and 8-digit \U as in Go.

Array

Map

Maps may in principle have any values as keys, not just strings. However, JSON restricts map keys to be strings.

Further Syntax Extensions

Binary integer

Octal Integer

This syntax is defined for consistency with the octal notation supported in many programming languages such as Python, Haskell, OCaml, Raku, and Ruby, Tcl. It consciously avoids the C language tradition of using a single leading 0 to indicate octal, which is easily confused with a decimal number with leading zeros.

Hexadecimal Integer

This definition is compatible with that proposed in JSON5.

Infinity/NaN

The infinithy-nan extension builds on number to add support for the special infinity and not-a-number (NaN) values that the IEEE 754 floating-point standard defines but JSON omits.

This definition is compatible with that proposed in JSON5.

Rational numbers



Bryan Ford