Home - Topics - Papers - Theses - Blog - CV - Photos - Funny

Composable Text Syntax

XXX to be written.

goal: composability. A valid CTs text inserted into another should remain a valid CTS text.

Matched open/close punctuation pairs, which we call matchers.

Basic principle: matchers must match. This dominates all other more specialized syntaxes.

We could define a variant of CTS for any particular set of matchers we want; all the principles would be the same.

But to avoid further confusion, would be best to have one “standardized” set of matchers. But which? Especially since Unicode provides seventy-some matched pairs, some of them typically used in different combinations.

But since the ASCII subset of the Unicode character set is the one still typically used in mechanically-parsed languages, we choose as our matchers the ASCII characters formally defined as open/close punctuation characters: namely the square brackets “[]”, parentheses “()”, and curly braces “{}”.

Why not the so-called “angle brackets”? Because these ASCII characters are formally defined as less-than and greater-than signs, and ubiquitously used in non-matching form as such in arithmetic expressions and the like. Because of this ambiguous overloading of less-than and greater-than signs, their use as angle brackets is problematic in numerous ways, leading to their sometimes being called brokets. Unicode defined a separate set of angle brackets ⟨⟩ especially as open/close punctuation.

Few existing text languages just happen to be “CTS compliant” already, but many of them are extremely close to CTS-compliant. Programming languages typically use the ASCII open/close punctuation in matching pairs anyway whenever they are syntactically relevant.



Bryan Ford