OILS / doc / ref / chap-j8.md View on Github | oilshell.org

265 lines, 168 significant
1---
2title: JSON / J8 Notation (Oils Reference)
3all_docs_url: ..
4body_css_class: width40
5default_highlighter: oils-sh
6preserve_anchor_case: yes
7---
8
9<div class="doc-ref-header">
10
11[Oils Reference](index.html) &mdash;
12Chapter **JSON / J8 Notation**
13
14</div>
15
16This chapter describes [JSON]($xref), and its **J8 Notation** superset.
17
18See the [J8 Notation doc](../j8-notation.html) for more background. This doc
19is a quick reference, not the official spec.
20
21<span class="in-progress">(in progress)</span>
22
23<div id="dense-toc">
24</div>
25
26
27## J8 Strings
28
29J8 strings are an upgrade of JSON strings that solve the *JSON-Unix Mismatch*.
30
31That is, Unix deals with byte strings, but JSON can't represent byte strings.
32
33<h3 id="json-string">json-string <code>"hi"</code></h3>
34
35All JSON strings are valid J8 strings!
36
37This is important for compatibility. Encoders may prefer to emit JSON-style
38`""` strings rather than `u''` or `b''` strings.
39
40Example:
41
42 "hi μ \n"
43
44To be explicit, you can prefix JSON strings with `j`:
45
46 j"hi μ \n" # same as above
47
48Of course, the `j""` prefix is accepted by our `json8` builtin, but not the
49`json` builtin.
50
51<h3 id="json-escape">json-escape <code>\" \n \u1234</code></h3>
52
53As a reminder, the backslash escapes valid in [JSON]($xref) strings are:
54
55 \" \\
56 \b \f \n \r \t
57 \u1234
58
59Additional J8 escapes are valid in `u''` and `b''` strings, described below.
60
61<h3 id="surrogate-pair">surrogate-pair <code>\ud83e\udd26</code></h3>
62
63JSON's `\u1234` escapes can't represent code points above `U+10000` or
642<sup>16</sup>, so JSON also has a "surrogate pair hack".
65
66That is, there are special code points in the "surrogate range" that can be
67paired to represent larger numbers.
68
69See the [Surrogate Pair Blog
70Post](https://www.oilshell.org/blog/2023/06/surrogate-pair.html) for an
71example:
72
73 "\ud83e\udd26"
74
75Because JSON strings are valid J8 strings, surrogate pairs are also part of J8
76notation. Decoders must accept them, but encoders should avoid them.
77
78You can emit `u'\u{1f926}'` or `b'\u{1f926}'` instead of `"\ud83\udd26"`.
79
80<h3 id="u-prefix">u-prefix <code>u'hi'</code></h3>
81
82A type of J8 string.
83
84 u'hi μ \n'
85
86It's never necessary to **emit**, but it can be used to express that a string
87is **valid Unicode**. JSON strings can represent strings that aren't Unicode
88because they may contain surrogate halves.
89
90In contrast, `u''` strings can only have escapes like `\u{1f926}`, with no
91surrogate pairs or halves.
92
93- The **encoded** bytes must be valid UTF-8, like JSON strings.
94- The **decoded** bytes must be valid UTF-8, **unlike** JSON strings.
95
96Escaping:
97
98- `u''` strings may **not** contain `\u1234` escapes. They must be `\u{1234}`,
99 `\u{1f926}`
100- They may not contain `\yff` escapes, because those would represent a string
101 that's not UTF-8 or Unicode.
102- Surrogate pairs are never necessary in `u''` or `b''` strings. Use the
103 longer form `\u{1f926}`.
104- You can always emit literal UTF-8, so `\u{1f926}` escapes aren't strictly
105 necessary. Decoders must accept these escapes.
106- A literal single quote is escaped with `\'`
107 - Decoders still accept `\"`, but encoders don't emit it.
108
109<h3 id="b-prefix">b-prefix <code>b'hi'</code></h3>
110
111Another J8 string. These `b''` strings are identical to `u''` strings, but
112they can also `\yff` escapes.
113
114Examples:
115
116 b'hi μ \n'
117 b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
118
119<h3 id="j8-escape">j8-escape<code>\u{1f926} \yff</code></h3>
120
121To summarize, the valid J8 escapes are:
122
123 \'
124 \yff # only valid in b'' strings
125 \u{3bc} \u{1f926} etc.
126
127<h3 id="no-prefix">no-prefix <code>'hi'</code></h3>
128
129Single-quoted strings without a `u` or `b` prefix are implicitly `u''`.
130
131 u'hi μ \n'
132 'hi μ \n' # same as above, no \yff escapes accepted
133
134They should be avoided in contexts where `""` strings may also appear, because
135it's easy to confuse single quotes and double quotes.
136
137## J8 Lines
138
139"J8 Lines" is a format built on top of J8 strings. Each line is either:
140
1411. An unquoted string, which must be valid UTF-8. Whitespace is allowed, but
142 not other ASCII control chars.
1432. A quoted J8 string (JSON style `""` or J8-style `b'' u'' ''`)
1443. An **ignored** empty line
145
146In all cases, leading and trailing whitespace is ignored.
147
148### unquoted-line
149
150Any line that doesn't begin with `"` or `b'` or `u'` is an unquoted line.
151Examples:
152
153 foo bar
154 C:\Program Files\
155 internal "quotes" aren't special
156
157In contrast, these are quoted lines, and must be valid J8 strings:
158
159 "json-style J8 string"
160 b'this is b style'
161 u'this is u style'
162
163## JSON8
164
165JSON8 is JSON with 4 more things allowed:
166
1671. J8 strings in addition to JSON strings
1681. Comments
1691. Unquoted keys (TODO)
1701. Trailing commas (TODO)
171
172### json8-num
173
174JSON8 numbers are identical to JSON numbers.
175
176Here is a decoding detail, specific to Oils:
177
178If there's a decimal point or `e-10` suffix, then it's decoded into a YSH
179`Float`. Otherwise it's a YSH `Int`.
180
181 42 # decoded to Int
182 42.0 # decoded to Float
183 42e1 # decoded to Float
184 42.0e1 # decoded to Float
185
186### json8-str
187
188JSON8 strings are J8 strings:
189
190<pre>
191"hi &#x1f926; \u03bc"
192u'hi &#x1f926; \u{3bc}'
193b'hi &#x1f926; \u{3bc} \yff'
194</pre>
195
196### json8-list
197
198TODO:
199
200Like JSON lists, but can have trailing comma. Examples:
201
202 [42, 43]
203 [42, 43,] # same as above
204
205### json8-dict
206
207TODO:
208
209Like JSON "objects", but:
210
211- Can have trailing comma.
212- Can have unquoted keys, as long as they're an identifier.
213
214Examples:
215
216 {"json8": "message"}
217 {json8: "message"} # same as above
218 {json8: "message",} # same as above
219
220### json8-comment
221
222End-of-line comments in the same style as shell:
223
224 {"json8": "message"} # comment
225
226## TSV8
227
228These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
229
230### column-attrs
231
232<!-- Consider #.tsv8 and 'type' perhaps
233
234#.tsv8 name age
235type Str Int
236other x y
237 Alice 42
238
239Also consider alignment.
240-->
241
242
243```
244!tsv8 name age
245!type Str Int
246!other x y
247 Alice 42
248 Bob 25
249```
250
251### column-types
252
253The primitives:
254
255- Bool
256- Int
257- Float
258- Str
259
260Note: Can `null` be in all cells? Maybe except `Bool`?
261
262It can stand in for `NA`?
263
264[JSON]: https://json.org
265