Home - Topics - Papers - Theses - Blog - CV - Photos - Funny |
JBS allows the specification of customized binary representations via extensions to the JSON Schema language.
(or JBIN? or BINSON?)
XXX allow specifying BESO (or CBOR or …) as a default binary encoding
XXX related: Katai Struct, The Next 700 Data Description Languages, binpac, IDRIS, etc.
When the scheme type
is integer
,
an encoding
property of unsigned
indicates
that values greater than or equal to zero are to be encoded
in a variable-length binary representation:
{
"type": "integer",
"encoding": "binary"
}
A byteOrder
property may specify
the byte order for the encoded integer,
either bigEndian
(most-significant byte first)
or littleEndian
(least-significant byte first).
{
"type": "integer",
"encoding": "binary",
"byteOrder": "littleEndian"
}
In this encoding, the integer 65 is encoded as the one-byte stream [0x41], while the integer 256 is encoded as the byte-stream [0x00,0x01].
In strict schema validation,
the byteOrder property is required.
In permissive scheme validation,
the byteOrder property is optional
and defaults to bigEndian
(network byte order).
When the scheme type
is integer
,
an encoding
property of zigzag
indicates
that signed integer values are to be zigzag-encoded first
into unsigned integer values,
then the latter encoded as a binary unsigned integer
as described above.
{
"type": "integer",
"encoding": "zigzag",
}
In zigzag encoding, a positive integer value v is encoded as the unsigned integer 2v, while negative integer values v are encoded as -2v - 1. That is:
0 0
-1 1
1 2
-2 3
2 4
-2 5
etc.
When the scheme type
is integer
and has a minimum
value greater than or equal to zero,
the scheme represents an unsigned integer or natural number.
An encoding
of unsigned
indicates that an integer
is to be encoded in unsigned binary representation
a given number of bits wide.
For example, this type indicates an integer
encoded in exactly one byte as an unsigned integer:
{
"type": "integer",
"encoding": "unsigned",
"bits": 8
}
The `bits’ property’s value must be a nonnegative integer indicating the bit-width of this representation.
If bits
is greater than 8,
then the encoding
property must also have a byteOrder
property
specifying the endianness of the representation:
either bigEndian
(network byte order) or littleEndian
.
XXX or should encoding
default to network byte order (bigEndian
)?
Perhaps it should for purposes of defining new formats,
but a strict-specification mode for schema validators
should require endianness to be specified?
An example of a 32-bit unsigned integer encoding in network byte-order:
{
"type": "integer",
"encoding": "unsigned",
"byteOrder": "bigEndian",
"bits": 32
}
The bits
field need not be a power of two.
For example, this integer is encoded in three bytes:
{
"type": "integer",
"encoding": "unsigned",
"byteOrder": "littleEndian",
"bits": 24
}
Note that the above examples did not specify a semantic value range
via minimum
or maximum
,
although the specified encoding can represent only a certain range of values.
This is not necessarily an error:
it simply means that not all semantically permitted values
are representable in the specified binary representation.
This corresponds to the arguably lazy but common practice in designing
software, programming languages, and binary formats to postulate
that some number of bits “should be enough” for all expected values
even though a specific value range has not been clearly defined.
Semantically allowing unrepresentable values may arguably be considered
bad practice for designing new formats and schemas, however,
so schema validators may want to provide a configuration option
that yields warnings or errors when a fixed-length representation is specified
without a specified minimum
and maximum
that is representable.
The following variant of the above 24-bit integer example restricts the semantic value to a range of 1 through 16,000,000, a subset of the representable value range of 0 through 16,777,215 (2^24-1).
{
"type": "integer",
"minimum": 1,
"maximum": 16000000
"encoding": "unsigned",
"byteOrder": "littleEndian",
"bits": 24
}
For now, bits
must be a multiple of 8.
We might later want to relax this to support
packed bit fields and bit streams.
But then we will probably also need to define a bitOrder
property
specifying the bit order within bytes
(i.e., whether we start filling bytes
from the most-significant or the least-significant bit).
An encoding
property of unsigned
indicates
that an integer is to be encoded as a fixed-width integer
in standard binary two’s-complement encoding,
with the most-significant bit serving as the sign bit.
As with unsigned integers,
a bits
property is required and must be a multiple of 8 for now,
and a byteOrder
property is required if bits
is greater than 8.
An example of an 8-bit signed integer:
{
"type": "integer",
"encoding": "signed",
"bits": 8
}
Similarly, a 64-bit little-endian signed integer:
{
"type": "integer",
"encoding": "signed",
"bits": 64,
"byteOrder": "littleEndian"
}
Integers may have an offset
property to indicate
that the semantic value is the offset plus the encoded value.
For example, this schema type represents semantic values
between 200 through 300 inclusive,
encoded as single-byte values from 0 through 100:
{
"type": "integer",
"minimum": 200,
"maximum": 300,
"offset": 200,
"encoding": "unsigned/bigEndian",
"bits": 8
}
When an offset
is specified and is less than or equal to the minimum
,
an unsigned-integer representation may be used
even if minimum
is less than or equal to zero.
This example encodes values from -100 through 100 in one byte,
with byte value 0 representing semantic value -100,
100 representing 0,
and 200 representing 100:
{
"type": "integer",
"minimum": -100,
"maximum": 100,
"offset": -100,
"encoding": "unsigned",
"bits": 8
}
When the scheme type is number
,
an associated `encoding’ property may name one of the
IEEE 754
standard fixed-length floating-point number representations
to indicate that representation be used
as the binary encoding of this number.
For example:
{
"type": "number",
"encoding": "binary32",
}
The floating-point representations currently defined in
IEEE 754-2008
are binary16
, binary32
, binary64
, and binary128
for binary floating-point representations,
and decimal32
, decimal64
, and decimal128
for decimal floating-point representations.
More standard floating-point encodings may be defined in the future,
of course.
A schema of type array
may specify an encoding
of packed
to indicate a packed array of fixed-length elements.
This example expresses a packed array of 8-bit unsigned integers:
{
"type": "array",
"items": {
"type": "integer",
"encoding": "unsigned",
"bits": 8
}
}
The array element schema specified in the items
property
must have a fixed-length binary representation for packed array encoding.
If the array also has a fixed length,
i.e., minItems
and maxItems
values both specified and equal,
then the resulting packed array has a fixed-length binary representation,
which is exactly the element type length times the array length.
A schema of type array
with an encoding
of cbe
indicates that the array has a binary representation consisting of
a sequence of variable-length items, each individually CBE-encoded.
Bryan Ford |