Oils Reference — Chapter JSON / J8 Notation

This chapter describes JSON, and its J8 Notation superset.

See the J8 Notation doc for more background. This doc is a quick reference, not the official spec.

(in progress)

In This Chapter

J8 Strings

J8 strings are an upgrade of JSON strings that solve the JSON-Unix Mismatch.

That is, Unix deals with byte strings, but JSON can't represent byte strings.

json-string "hi"

All JSON strings are valid J8 strings!

This is important for compatibility. Encoders may prefer to emit JSON-style "" strings rather than u'' or b'' strings.

Example:

"hi μ \n"

To be explicit, you can prefix JSON strings with j:

j"hi μ \n"  # same as above

Of course, the j"" prefix is accepted by our json8 builtin, but not the json builtin.

json-escape \" \n \u1234

As a reminder, the backslash escapes valid in JSON strings are:

\" \\
\b \f \n \r \t
\u1234

Additional J8 escapes are valid in u'' and b'' strings, described below.

surrogate-pair \ud83e\udd26

JSON's \u1234 escapes can't represent code points above U+10000 or 216, so JSON also has a "surrogate pair hack".

That is, there are special code points in the "surrogate range" that can be paired to represent larger numbers.

See the Surrogate Pair Blog Post for an example:

"\ud83e\udd26"

Because JSON strings are valid J8 strings, surrogate pairs are also part of J8 notation. Decoders must accept them, but encoders should avoid them.

You can emit u'\u{1f926}' or b'\u{1f926}' instead of "\ud83\udd26".

u-prefix u'hi'

A type of J8 string.

u'hi μ \n'

It's never necessary to emit, but it can be used to express that a string is valid Unicode. JSON strings can represent strings that aren't Unicode because they may contain surrogate halves.

In contrast, u'' strings can only have escapes like \u{1f926}, with no surrogate pairs or halves.

Escaping:

b-prefix b'hi'

Another J8 string. These b'' strings are identical to u'' strings, but they can also \yff escapes.

Examples:

b'hi μ \n'
b'this isn\'t a valid unicode string \yff\fe \u{3bc}'

j8-escape\u{1f926} \yff

To summarize, the valid J8 escapes are:

\'
\yff   # only valid in b'' strings
\u{3bc} \u{1f926} etc.

no-prefix 'hi'

Single-quoted strings without a u or b prefix are implicitly u''.

u'hi μ \n'  
 'hi μ \n'  # same as above, no \yff escapes accepted

They should be avoided in contexts where "" strings may also appear, because it's easy to confuse single quotes and double quotes.

J8 Lines

"J8 Lines" is a format built on top of J8 strings. Each line is either:

  1. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but not other ASCII control chars.
  2. A quoted J8 string (JSON style "" or J8-style b'' u'' '')
  3. An ignored empty line

In all cases, leading and trailing whitespace is ignored.

unquoted-line

Any line that doesn't begin with " or b' or u' is an unquoted line. Examples:

foo bar
C:\Program Files\
internal "quotes" aren't special

In contrast, these are quoted lines, and must be valid J8 strings:

"json-style J8 string"
b'this is b style'
u'this is u style'

JSON8

JSON8 is JSON with 4 more things allowed:

  1. J8 strings in addition to JSON strings
  2. Comments
  3. Unquoted keys (TODO)
  4. Trailing commas (TODO)

json8-num

JSON8 numbers are identical to JSON numbers.

Here is a decoding detail, specific to Oils:

If there's a decimal point or e-10 suffix, then it's decoded into a YSH Float. Otherwise it's a YSH Int.

42       # decoded to Int
42.0     # decoded to Float
42e1     # decoded to Float
42.0e1   # decoded to Float

json8-str

JSON8 strings are J8 strings:

"hi 🤦 \u03bc"
u'hi 🤦 \u{3bc}'
b'hi 🤦 \u{3bc} \yff'

json8-list

TODO:

Like JSON lists, but can have trailing comma. Examples:

[42, 43]
[42, 43,]   # same as above

json8-dict

TODO:

Like JSON "objects", but:

Examples:

{"json8": "message"}
{json8: "message"}     # same as above
{json8: "message",}    # same as above

json8-comment

End-of-line comments in the same style as shell:

{"json8": "message"}   # comment

TSV8

These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.

column-attrs

!tsv8    name    age
!type    Str     Int
!other   x       y
         Alice   42
         Bob     25

column-types

The primitives:

Note: Can null be in all cells? Maybe except Bool?

It can stand in for NA?

Generated on Thu, 25 Jul 2024 04:04:16 +0000