1 | ---
|
2 | title: JSON / J8 Notation (Oils Reference)
|
3 | all_docs_url: ..
|
4 | body_css_class: width40
|
5 | default_highlighter: oils-sh
|
6 | preserve_anchor_case: yes
|
7 | ---
|
8 |
|
9 | <div class="doc-ref-header">
|
10 |
|
11 | [Oils Reference](index.html) —
|
12 | Chapter **JSON / J8 Notation**
|
13 |
|
14 | </div>
|
15 |
|
16 | This chapter describes [JSON]($xref), and its **J8 Notation** superset.
|
17 |
|
18 | See the [J8 Notation doc](../j8-notation.html) for more background. This doc
|
19 | is a quick reference, not the official spec.
|
20 |
|
21 | <span class="in-progress">(in progress)</span>
|
22 |
|
23 | <div id="dense-toc">
|
24 | </div>
|
25 |
|
26 |
|
27 | ## J8 Strings
|
28 |
|
29 | J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
|
30 |
|
31 | That is, Unix deals with byte strings, but JSON can't represent byte strings.
|
32 |
|
33 | <h3 id="json-string">json-string <code>"hi"</code></h3>
|
34 |
|
35 | All JSON strings are valid J8 strings!
|
36 |
|
37 | This is important for compatibility. Encoders may prefer to emit JSON-style
|
38 | `""` strings rather than `u''` or `b''` strings.
|
39 |
|
40 | Example:
|
41 |
|
42 | "hi μ \n"
|
43 |
|
44 | To be explicit, you can prefix JSON strings with `j`:
|
45 |
|
46 | j"hi μ \n" # same as above
|
47 |
|
48 | Of course, the `j""` prefix is accepted by our `json8` builtin, but not the
|
49 | `json` builtin.
|
50 |
|
51 | <h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
|
52 |
|
53 | As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
|
54 |
|
55 | \" \\
|
56 | \b \f \n \r \t
|
57 | \u1234
|
58 |
|
59 | Additional J8 escapes are valid in `u''` and `b''` strings, described below.
|
60 |
|
61 | <h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
|
62 |
|
63 | JSON's `\u1234` escapes can't represent code points above `U+10000` or
|
64 | 2<sup>16</sup>, so JSON also has a "surrogate pair hack".
|
65 |
|
66 | That is, there are special code points in the "surrogate range" that can be
|
67 | paired to represent larger numbers.
|
68 |
|
69 | See the [Surrogate Pair Blog
|
70 | Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
|
71 | example:
|
72 |
|
73 | "\ud83e\udd26"
|
74 |
|
75 | Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
|
76 | notation. Decoders must accept them, but encoders should avoid them.
|
77 |
|
78 | You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
|
79 |
|
80 | <h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
|
81 |
|
82 | A type of J8 string.
|
83 |
|
84 | u'hi μ \n'
|
85 |
|
86 | It's never necessary to **emit**, but it can be used to express that a string
|
87 | is **valid Unicode**. JSON strings can represent strings that aren't Unicode
|
88 | because they may contain surrogate halves.
|
89 |
|
90 | In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
|
91 | surrogate pairs or halves.
|
92 |
|
93 | - The **encoded** bytes must be valid UTF-8, like JSON strings.
|
94 | - The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
|
95 |
|
96 | Escaping:
|
97 |
|
98 | - `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
|
99 | `\u{1f926}`
|
100 | - They may not contain `\yff` escapes, because those would represent a string
|
101 | that's not UTF-8 or Unicode.
|
102 | - Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
|
103 | longer form `\u{1f926}`.
|
104 | - You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
|
105 | necessary. Decoders must accept these escapes.
|
106 | - A literal single quote is escaped with `\'`
|
107 | - Decoders still accept `\"`, but encoders don't emit it.
|
108 |
|
109 | <h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
|
110 |
|
111 | Another J8 string. These `b''` strings are identical to `u''` strings, but
|
112 | they can also `\yff` escapes.
|
113 |
|
114 | Examples:
|
115 |
|
116 | b'hi μ \n'
|
117 | b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
|
118 |
|
119 | <h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
|
120 |
|
121 | To summarize, the valid J8 escapes are:
|
122 |
|
123 | \'
|
124 | \yff # only valid in b'' strings
|
125 | \u{3bc} \u{1f926} etc.
|
126 |
|
127 | <h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
|
128 |
|
129 | Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
|
130 |
|
131 | u'hi μ \n'
|
132 | 'hi μ \n' # same as above, no \yff escapes accepted
|
133 |
|
134 | They should be avoided in contexts where `""` strings may also appear, because
|
135 | it's easy to confuse single quotes and double quotes.
|
136 |
|
137 | ## J8 Lines
|
138 |
|
139 | "J8 Lines" is a format built on top of J8 strings. Each line is either:
|
140 |
|
141 | 1. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
|
142 | not other ASCII control chars.
|
143 | 2. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
|
144 | 3. An **ignored** empty line
|
145 |
|
146 | In all cases, leading and trailing whitespace is ignored.
|
147 |
|
148 | ### unquoted-line
|
149 |
|
150 | Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
|
151 | Examples:
|
152 |
|
153 | foo bar
|
154 | C:\Program Files\
|
155 | internal "quotes" aren't special
|
156 |
|
157 | In contrast, these are quoted lines, and must be valid J8 strings:
|
158 |
|
159 | "json-style J8 string"
|
160 | b'this is b style'
|
161 | u'this is u style'
|
162 |
|
163 | ## JSON8
|
164 |
|
165 | JSON8 is JSON with 4 more things allowed:
|
166 |
|
167 | 1. J8 strings in addition to JSON strings
|
168 | 1. Comments
|
169 | 1. Unquoted keys (TODO)
|
170 | 1. Trailing commas (TODO)
|
171 |
|
172 | ### json8-num
|
173 |
|
174 | JSON8 numbers are identical to JSON numbers.
|
175 |
|
176 | Here is a decoding detail, specific to Oils:
|
177 |
|
178 | If there's a decimal point or `e-10` suffix, then it's decoded into a YSH
|
179 | `Float`. Otherwise it's a YSH `Int`.
|
180 |
|
181 | 42 # decoded to Int
|
182 | 42.0 # decoded to Float
|
183 | 42e1 # decoded to Float
|
184 | 42.0e1 # decoded to Float
|
185 |
|
186 | ### json8-str
|
187 |
|
188 | JSON8 strings are J8 strings:
|
189 |
|
190 | <pre>
|
191 | "hi 🤦 \u03bc"
|
192 | u'hi 🤦 \u{3bc}'
|
193 | b'hi 🤦 \u{3bc} \yff'
|
194 | </pre>
|
195 |
|
196 | ### json8-list
|
197 |
|
198 | TODO:
|
199 |
|
200 | Like JSON lists, but can have trailing comma. Examples:
|
201 |
|
202 | [42, 43]
|
203 | [42, 43,] # same as above
|
204 |
|
205 | ### json8-dict
|
206 |
|
207 | TODO:
|
208 |
|
209 | Like JSON "objects", but:
|
210 |
|
211 | - Can have trailing comma.
|
212 | - Can have unquoted keys, as long as they're an identifier.
|
213 |
|
214 | Examples:
|
215 |
|
216 | {"json8": "message"}
|
217 | {json8: "message"} # same as above
|
218 | {json8: "message",} # same as above
|
219 |
|
220 | ### json8-comment
|
221 |
|
222 | End-of-line comments in the same style as shell:
|
223 |
|
224 | {"json8": "message"} # comment
|
225 |
|
226 | ## TSV8
|
227 |
|
228 | These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
|
229 |
|
230 | ### column-attrs
|
231 |
|
232 | <!-- Consider #.tsv8 and 'type' perhaps
|
233 |
|
234 | #.tsv8 name age
|
235 | type Str Int
|
236 | other x y
|
237 | Alice 42
|
238 |
|
239 | Also consider alignment.
|
240 | -->
|
241 |
|
242 |
|
243 | ```
|
244 | !tsv8 name age
|
245 | !type Str Int
|
246 | !other x y
|
247 | Alice 42
|
248 | Bob 25
|
249 | ```
|
250 |
|
251 | ### column-types
|
252 |
|
253 | The primitives:
|
254 |
|
255 | - Bool
|
256 | - Int
|
257 | - Float
|
258 | - Str
|
259 |
|
260 | Note: Can `null` be in all cells? Maybe except `Bool`?
|
261 |
|
262 | It can stand in for `NA`?
|
263 |
|
264 | [JSON]: https://json.org
|
265 |
|