OILS / doc / ref / chap-errors.md View on Github | oilshell.org

167 lines, 115 significant
1---
2title: Errors (Oils Reference)
3all_docs_url: ..
4body_css_class: width40
5default_highlighter: oils-sh
6preserve_anchor_case: yes
7---
8
9<div class="doc-ref-header">
10
11[Oils Reference](index.html) &mdash;
12Chapter **Errors**
13
14</div>
15
16This chapter describes **errors** for data languages. An error checklist is
17often a nice, concise way to describe a language.
18
19Related: [Oils Error Catalog, With Hints](../error-catalog.html) describes
20errors in code.
21
22<span class="in-progress">(in progress)</span>
23
24<div id="dense-toc">
25</div>
26
27## UTF8
28
29J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
30
31### err-utf8-encode
32
33Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
34
35But it may have a function to encode UTF-8 from a `List[Int]`. These errors
36would be handled:
37
381. Integer greater than max code point
391. Code point in the surrogate range
40
41### err-utf8-decode
42
43A UTF-8 decoder should handle these errors:
44
451. Overlong encoding. In UTF-8, each code point should be represented with the
46 fewest possible bytes.
47 - Overlong encodings are the equivalent of writing the integer `42` as
48 `042`, `0042`, `00042`, etc. This is not allowed.
491. Surrogate code point. The sequence decodes to a code point in the surrogate
50 range, which is used only for the UTF-16 encoding, not for string data.
511. Exceeds max code point. The sequence decodes to an integer that's larger
52 than the maximum code point.
531. Bad encoding. A byte is not encoded like a UTF-8 start byte or a
54 continuation byte.
551. Incomplete sequence. Too few continuation bytes appeared after the start
56 byte.
57
58## J8 String
59
60J8 strings extend [JSON]($xref) strings, and are a primary building block of J8
61Notation.
62
63### err-j8-str-encode
64
65J8 strings can represent any string &mdash; bytes or unicode &mdash; so there
66are **no encoding errors**.
67
68### err-j8-str-decode
69
701. Escape sequence like `\u{dc00}` should not be in the surrogate range.
71 - This means it doesn't represent a real character. Byte escapes like
72 `\yff` should be used instead.
731. Escape sequence like `\u{110000}` is greater than the maximimum Unicode code
74 point.
751. Byte escapes like `\yff` should not be in `u''` string.
76 - By design, they're only valid in `b''` strings.
77
78## J8 Lines
79
80Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In
81[YSH]($xref), it's used by `@(split command sub)`.
82
83### err-j8-lines-encode
84
85Like J8 strings, J8 Lines have no encoding errors by design.
86
87### err-j8-lines-decode
88
891. Any error in a J8 quoted string.
90 - e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
911. A line with a quoted string has extra text after it.
92 - e.g. `"mystr" extra`.
931. An unquoted line is not valid UTF-8.
94
95## JSON
96
97### err-json-encode
98
99JSON encoding has these errors:
100
1011. Object of this type can't be serialized.
102 - For example, `Str List Dict` are Oils objects can be serialized, but
103 `Eggex Func Range` can't.
1041. Circular reference.
105 - e.g. a Dict that points to itself, a List that points to itself, and other
106 permutations
1071. Float values of NaN, Inf, and -Inf can't be encoded.
108 - TODO: option to use `null` like JavaScript.
109
110Note that invalid UTF-8 bytes like `0xfe` produce a Unicode replacement
111character, not a hard error.
112
113### err-json-decode
114
1151. The encoded message itself is not valid UTF-8.
116 - (Typically, you need to check the unescaped bytes in string literals
117 `"abc\n"`).
1181. Lexical error, like
119 - the message `+`
120 - an invalid escape `"\z"` or a truncated escape `"\u1"`
121 - A single quoted string like `u''`
1221. Grammatical error
123 - like the message `}{`
1241. Unexpected trailing input
125 - like the message `42]` or `{}]`
126
127## JSON8
128
129### err-json8-encode
130
131JSON8 has the same encoding errors as JSON.
132
133However, the encoding is lossless by design. Instead of invalid UTF-8 being
134turned into a Unicode replacment character, it can use J8 strings with byte
135escapes like `b'byte \yfe\yff'`.
136
137### err-json8-decode
138
139JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
140
141See [err-j8-str-decode](#err-j8-str-decode).
142
143<!--
144
145## Packle
146
147TODO: Not implemented!
148
149### err-packle-encode
150
151Packle has no encoding errors!
152
1531. TODO: Unserializable `Eggex Func Range` can be turned into "wire Tuple"
154 `(type_name: Str, heap_id: Int)`.
155 - When you read a packle into Python, you'll get a tuple.
156 - When you read a packle back into YSH, you'll get a `value.Tombstone`?
1571. Circular references are allowed. Packle data expresses a **graph**, not a
158 tree.
1591. Float values NaN, Inf, and -Inf use their binary representations.
1601. Both Unicode and binary data are allowed.
161
162### err-packle-decode
163
164TODO
165
166-->
167