Home - Topics - Papers - Theses - Blog - CV - Photos - Funny

JSON Binary Schema

JBS allows the specification of customized binary representations via extensions to the JSON Schema language.

(or JBIN? or BINSON?)

XXX allow specifying BESO (or CBOR or …) as a default binary encoding

XXX related: Katai Struct, The Next 700 Data Description Languages, binpac, IDRIS, etc.

Variable-width unsigned integers

When the scheme type is integer, an encoding property of unsigned indicates that values greater than or equal to zero are to be encoded in a variable-length binary representation:

{
	"type": "integer",
	"encoding": "binary"
}

A byteOrder property may specify the byte order for the encoded integer, either bigEndian (most-significant byte first) or littleEndian (least-significant byte first).

{
	"type": "integer",
	"encoding": "binary",
	"byteOrder": "littleEndian"
}

In this encoding, the integer 65 is encoded as the one-byte stream [0x41], while the integer 256 is encoded as the byte-stream [0x00,0x01].

In strict schema validation, the byteOrder property is required. In permissive scheme validation, the byteOrder property is optional and defaults to bigEndian (network byte order).

Variable-width signed integers

When the scheme type is integer, an encoding property of zigzag indicates that signed integer values are to be zigzag-encoded first into unsigned integer values, then the latter encoded as a binary unsigned integer as described above.

{
	"type": "integer",
	"encoding": "zigzag",
}

In zigzag encoding, a positive integer value v is encoded as the unsigned integer 2v, while negative integer values v are encoded as -2v - 1. That is:

0	0
-1	1
1	2
-2	3
2	4
-2	5
etc.

Unsigned integers

When the scheme type is integer and has a minimum value greater than or equal to zero, the scheme represents an unsigned integer or natural number.

Fixed-length unsigned integers

An encoding of unsigned indicates that an integer is to be encoded in unsigned binary representation a given number of bits wide. For example, this type indicates an integer encoded in exactly one byte as an unsigned integer:

{
	"type": "integer",
	"encoding": "unsigned",
	"bits": 8
}

The `bits’ property’s value must be a nonnegative integer indicating the bit-width of this representation.

If bits is greater than 8, then the encoding property must also have a byteOrder property specifying the endianness of the representation: either bigEndian (network byte order) or littleEndian.

XXX or should encoding default to network byte order (bigEndian)? Perhaps it should for purposes of defining new formats, but a strict-specification mode for schema validators should require endianness to be specified?

An example of a 32-bit unsigned integer encoding in network byte-order:

{
	"type": "integer",
	"encoding": "unsigned",
	"byteOrder": "bigEndian",
	"bits": 32
}

The bits field need not be a power of two. For example, this integer is encoded in three bytes:

{
	"type": "integer",
	"encoding": "unsigned",
	"byteOrder": "littleEndian",
	"bits": 24
}

Note that the above examples did not specify a semantic value range via minimum or maximum, although the specified encoding can represent only a certain range of values. This is not necessarily an error: it simply means that not all semantically permitted values are representable in the specified binary representation. This corresponds to the arguably lazy but common practice in designing software, programming languages, and binary formats to postulate that some number of bits “should be enough” for all expected values even though a specific value range has not been clearly defined. Semantically allowing unrepresentable values may arguably be considered bad practice for designing new formats and schemas, however, so schema validators may want to provide a configuration option that yields warnings or errors when a fixed-length representation is specified without a specified minimum and maximum that is representable.

The following variant of the above 24-bit integer example restricts the semantic value to a range of 1 through 16,000,000, a subset of the representable value range of 0 through 16,777,215 (2^24-1).

{
	"type": "integer",
	"minimum": 1,
	"maximum": 16000000
	"encoding": "unsigned",
	"byteOrder": "littleEndian",
	"bits": 24
}

For now, bits must be a multiple of 8. We might later want to relax this to support packed bit fields and bit streams. But then we will probably also need to define a bitOrder property specifying the bit order within bytes (i.e., whether we start filling bytes from the most-significant or the least-significant bit).

Fixed-length signed integers

An encoding property of unsigned indicates that an integer is to be encoded as a fixed-width integer in standard binary two’s-complement encoding, with the most-significant bit serving as the sign bit. As with unsigned integers, a bits property is required and must be a multiple of 8 for now, and a byteOrder property is required if bits is greater than 8.

An example of an 8-bit signed integer:

{
	"type": "integer",
	"encoding": "signed",
	"bits": 8
}

Similarly, a 64-bit little-endian signed integer:

{
	"type": "integer",
	"encoding": "signed",
	"bits": 64,
	"byteOrder": "littleEndian"
}

Offset-encoded integers

Integers may have an offset property to indicate that the semantic value is the offset plus the encoded value. For example, this schema type represents semantic values between 200 through 300 inclusive, encoded as single-byte values from 0 through 100:

{
	"type": "integer",
	"minimum": 200,
	"maximum": 300,
	"offset": 200,
	"encoding": "unsigned/bigEndian",
	"bits": 8
}

When an offset is specified and is less than or equal to the minimum, an unsigned-integer representation may be used even if minimum is less than or equal to zero. This example encodes values from -100 through 100 in one byte, with byte value 0 representing semantic value -100, 100 representing 0, and 200 representing 100:

{
	"type": "integer",
	"minimum": -100,
	"maximum": 100,
	"offset": -100,
	"encoding": "unsigned",
	"bits": 8
}

Floating-point numbers

Fixed-length binary floating-point numbers

When the scheme type is number, an associated `encoding’ property may name one of the IEEE 754 standard fixed-length floating-point number representations to indicate that representation be used as the binary encoding of this number. For example:

{
	"type": "number",
	"encoding": "binary32",
}

The floating-point representations currently defined in IEEE 754-2008 are binary16, binary32, binary64, and binary128 for binary floating-point representations, and decimal32, decimal64, and decimal128 for decimal floating-point representations. More standard floating-point encodings may be defined in the future, of course.

Variable-length binary floating-point numbers

Variable-length strings

Packed arrays with fixed-length items

A schema of type array may specify an encoding of packed to indicate a packed array of fixed-length elements. This example expresses a packed array of 8-bit unsigned integers:

{
	"type": "array",
	"items": {
		"type": "integer",
		"encoding": "unsigned",
		"bits": 8
	}
}

The array element schema specified in the items property must have a fixed-length binary representation for packed array encoding.

If the array also has a fixed length, i.e., minItems and maxItems values both specified and equal, then the resulting packed array has a fixed-length binary representation, which is exactly the element type length times the array length.

CBE-encoded arrays with variable-length items

A schema of type array with an encoding of cbe indicates that the array has a binary representation consisting of a sequence of variable-length items, each individually CBE-encoded.



Bryan Ford