| 1 | ---
 | 
| 2 | title: JSON / J8 Notation (Oils Reference)
 | 
| 3 | all_docs_url: ..
 | 
| 4 | body_css_class: width40
 | 
| 5 | default_highlighter: oils-sh
 | 
| 6 | preserve_anchor_case: yes
 | 
| 7 | ---
 | 
| 8 | 
 | 
| 9 | <div class="doc-ref-header">
 | 
| 10 | 
 | 
| 11 | [Oils Reference](index.html) —
 | 
| 12 | Chapter **JSON / J8 Notation**
 | 
| 13 | 
 | 
| 14 | </div>
 | 
| 15 | 
 | 
| 16 | This chapter describes [JSON]($xref), and its **J8 Notation** superset.
 | 
| 17 | 
 | 
| 18 | See the [J8 Notation doc](../j8-notation.html) for more background.  This doc
 | 
| 19 | is a quick reference, not the official spec.
 | 
| 20 | 
 | 
| 21 | <span class="in-progress">(in progress)</span>
 | 
| 22 | 
 | 
| 23 | <div id="dense-toc">
 | 
| 24 | </div>
 | 
| 25 | 
 | 
| 26 | 
 | 
| 27 | ## J8 Strings
 | 
| 28 | 
 | 
| 29 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
 | 
| 30 | 
 | 
| 31 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
 | 
| 32 | 
 | 
| 33 | <h3 id="json-string">json-string <code>"hi"</code></h3>
 | 
| 34 | 
 | 
| 35 | All JSON strings are valid J8 strings!
 | 
| 36 | 
 | 
| 37 | This is important for compatibility.  Encoders may prefer to emit JSON-style
 | 
| 38 | `""` strings rather than `u''` or `b''` strings.
 | 
| 39 | 
 | 
| 40 | Example:
 | 
| 41 | 
 | 
| 42 |     "hi μ \n"
 | 
| 43 | 
 | 
| 44 | To be explicit, you can prefix JSON strings with `j`:
 | 
| 45 | 
 | 
| 46 |     j"hi μ \n"  # same as above
 | 
| 47 | 
 | 
| 48 | Of course, the `j""` prefix is accepted by our `json8` builtin, but not the
 | 
| 49 | `json` builtin.
 | 
| 50 | 
 | 
| 51 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
 | 
| 52 | 
 | 
| 53 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
 | 
| 54 | 
 | 
| 55 |     \" \\
 | 
| 56 |     \b \f \n \r \t
 | 
| 57 |     \u1234
 | 
| 58 | 
 | 
| 59 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
 | 
| 60 | 
 | 
| 61 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
 | 
| 62 | 
 | 
| 63 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
 | 
| 64 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
 | 
| 65 | 
 | 
| 66 | That is, there are special code points in the "surrogate range" that can be
 | 
| 67 | paired to represent larger numbers.
 | 
| 68 | 
 | 
| 69 | See the [Surrogate Pair Blog
 | 
| 70 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
 | 
| 71 | example:
 | 
| 72 | 
 | 
| 73 |     "\ud83e\udd26"
 | 
| 74 | 
 | 
| 75 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
 | 
| 76 | notation.  Decoders must accept them, but encoders should avoid them.
 | 
| 77 | 
 | 
| 78 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
 | 
| 79 | 
 | 
| 80 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
 | 
| 81 | 
 | 
| 82 | A type of J8 string.
 | 
| 83 | 
 | 
| 84 |     u'hi μ \n'
 | 
| 85 | 
 | 
| 86 | It's never necessary to **emit**, but it can be used to express that a string
 | 
| 87 | is **valid Unicode**.  JSON strings can represent strings that aren't Unicode
 | 
| 88 | because they may contain surrogate halves.
 | 
| 89 | 
 | 
| 90 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
 | 
| 91 | surrogate pairs or halves.
 | 
| 92 | 
 | 
| 93 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
 | 
| 94 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
 | 
| 95 | 
 | 
| 96 | Escaping:
 | 
| 97 | 
 | 
| 98 | - `u''` strings may **not** contain `\u1234` escapes.  They must be `\u{1234}`,
 | 
| 99 |   `\u{1f926}`
 | 
| 100 | - They may not contain `\yff` escapes, because those would represent a string
 | 
| 101 |   that's not UTF-8 or Unicode.
 | 
| 102 | - Surrogate pairs are never necessary in `u''` or `b''` strings.  Use the
 | 
| 103 |   longer form `\u{1f926}`.
 | 
| 104 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
 | 
| 105 |   necessary.  Decoders must accept these escapes.
 | 
| 106 | - A literal single quote is escaped with `\'`
 | 
| 107 |   - Decoders still accept `\"`, but encoders don't emit it.
 | 
| 108 | 
 | 
| 109 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
 | 
| 110 | 
 | 
| 111 | Another J8 string.  These `b''` strings are identical to `u''` strings, but
 | 
| 112 | they can also `\yff` escapes.
 | 
| 113 | 
 | 
| 114 | Examples:
 | 
| 115 | 
 | 
| 116 |     b'hi μ \n'
 | 
| 117 |     b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
 | 
| 118 | 
 | 
| 119 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
 | 
| 120 | 
 | 
| 121 | To summarize, the valid J8 escapes are:
 | 
| 122 | 
 | 
| 123 |     \'
 | 
| 124 |     \yff   # only valid in b'' strings
 | 
| 125 |     \u{3bc} \u{1f926} etc.
 | 
| 126 | 
 | 
| 127 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
 | 
| 128 | 
 | 
| 129 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
 | 
| 130 | 
 | 
| 131 |     u'hi μ \n'  
 | 
| 132 |      'hi μ \n'  # same as above, no \yff escapes accepted
 | 
| 133 | 
 | 
| 134 | They should be avoided in contexts where `""` strings may also appear, because
 | 
| 135 | it's easy to confuse single quotes and double quotes.
 | 
| 136 | 
 | 
| 137 | ## J8 Lines
 | 
| 138 | 
 | 
| 139 | "J8 Lines" is a format built on top of J8 strings.  Each line is either:
 | 
| 140 | 
 | 
| 141 | 1. An unquoted string, which must be valid UTF-8.  Whitespace is allowed, but
 | 
| 142 |    not other ASCII control chars.
 | 
| 143 | 2. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
 | 
| 144 | 3. An **ignored** empty line
 | 
| 145 | 
 | 
| 146 | In all cases, leading and trailing whitespace is ignored.
 | 
| 147 | 
 | 
| 148 | ### unquoted-line
 | 
| 149 | 
 | 
| 150 | Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
 | 
| 151 | Examples:
 | 
| 152 | 
 | 
| 153 |     foo bar
 | 
| 154 |     C:\Program Files\
 | 
| 155 |     internal "quotes" aren't special
 | 
| 156 | 
 | 
| 157 | In contrast, these are quoted lines, and must be valid J8 strings:
 | 
| 158 | 
 | 
| 159 |     "json-style J8 string"
 | 
| 160 |     b'this is b style'
 | 
| 161 |     u'this is u style'
 | 
| 162 |     
 | 
| 163 | ## JSON8
 | 
| 164 | 
 | 
| 165 | JSON8 is JSON with 4 more things allowed:
 | 
| 166 | 
 | 
| 167 | 1. J8 strings in addition to JSON strings
 | 
| 168 | 1. Comments
 | 
| 169 | 1. Unquoted keys (TODO)
 | 
| 170 | 1. Trailing commas (TODO)
 | 
| 171 | 
 | 
| 172 | ### json8-num
 | 
| 173 | 
 | 
| 174 | JSON8 numbers are identical to JSON numbers.
 | 
| 175 | 
 | 
| 176 | Here is a decoding detail, specific to Oils:
 | 
| 177 | 
 | 
| 178 | If there's a decimal point or `e-10` suffix, then it's decoded into a YSH
 | 
| 179 | `Float`.  Otherwise it's a YSH `Int`.
 | 
| 180 | 
 | 
| 181 |     42       # decoded to Int
 | 
| 182 |     42.0     # decoded to Float
 | 
| 183 |     42e1     # decoded to Float
 | 
| 184 |     42.0e1   # decoded to Float
 | 
| 185 | 
 | 
| 186 | ### json8-str
 | 
| 187 | 
 | 
| 188 | JSON8 strings are J8 strings:
 | 
| 189 | 
 | 
| 190 | <pre>
 | 
| 191 | "hi 🤦 \u03bc"
 | 
| 192 | u'hi 🤦 \u{3bc}'
 | 
| 193 | b'hi 🤦 \u{3bc} \yff'
 | 
| 194 | </pre>
 | 
| 195 | 
 | 
| 196 | ### json8-list
 | 
| 197 | 
 | 
| 198 | TODO:
 | 
| 199 | 
 | 
| 200 | Like JSON lists, but can have trailing comma.  Examples:
 | 
| 201 | 
 | 
| 202 |     [42, 43]
 | 
| 203 |     [42, 43,]   # same as above
 | 
| 204 | 
 | 
| 205 | ### json8-dict
 | 
| 206 | 
 | 
| 207 | TODO:
 | 
| 208 | 
 | 
| 209 | Like JSON "objects", but:
 | 
| 210 | 
 | 
| 211 | - Can have trailing comma.
 | 
| 212 | - Can have unquoted keys, as long as they're an identifier.
 | 
| 213 | 
 | 
| 214 | Examples:
 | 
| 215 | 
 | 
| 216 |     {"json8": "message"}
 | 
| 217 |     {json8: "message"}     # same as above
 | 
| 218 |     {json8: "message",}    # same as above
 | 
| 219 | 
 | 
| 220 | ### json8-comment
 | 
| 221 | 
 | 
| 222 | End-of-line comments in the same style as shell:
 | 
| 223 | 
 | 
| 224 |     {"json8": "message"}   # comment
 | 
| 225 | 
 | 
| 226 | ## TSV8
 | 
| 227 | 
 | 
| 228 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
 | 
| 229 | 
 | 
| 230 | ### column-attrs   
 | 
| 231 | 
 | 
| 232 | <!-- Consider #.tsv8 and 'type' perhaps 
 | 
| 233 | 
 | 
| 234 | #.tsv8   name    age
 | 
| 235 | type     Str     Int
 | 
| 236 | other    x       y
 | 
| 237 |          Alice   42
 | 
| 238 | 
 | 
| 239 | Also consider alignment.
 | 
| 240 | -->
 | 
| 241 | 
 | 
| 242 | 
 | 
| 243 | ```
 | 
| 244 | !tsv8    name    age
 | 
| 245 | !type    Str     Int
 | 
| 246 | !other   x       y
 | 
| 247 |          Alice   42
 | 
| 248 |          Bob     25
 | 
| 249 | ```
 | 
| 250 | 
 | 
| 251 | ### column-types
 | 
| 252 | 
 | 
| 253 | The primitives:
 | 
| 254 | 
 | 
| 255 | - Bool
 | 
| 256 | - Int
 | 
| 257 | - Float
 | 
| 258 | - Str
 | 
| 259 | 
 | 
| 260 | Note: Can `null` be in all cells?  Maybe except `Bool`?
 | 
| 261 | 
 | 
| 262 | It can stand in for `NA`?
 | 
| 263 | 
 | 
| 264 | [JSON]: https://json.org
 | 
| 265 | 
 |