doc/ref/chap-errors.md

OILS / doc / ref / chap-errors.md View on Github | oilshell.org

186 lines, 130 significant

1	---
2	title: Errors (Oils Reference)
3	all_docs_url: ..
4	body_css_class: width40
5	default_highlighter: oils-sh
6	preserve_anchor_case: yes
7	---
8
9	<div class="doc-ref-header">
10
11	[Oils Reference](index.html) —
12	Chapter Errors
13
14	</div>
15
16	This chapter describes errors for data languages. An error checklist is
17	often a nice, concise way to describe a language.
18
19	Related: [Oils Error Catalog, With Hints](../error-catalog.html) describes
20	errors in code.
21
22	<span class="in-progress">(in progress)</span>
23
24	<div id="dense-toc">
25	</div>
26
27	## UTF8
28
29	J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
30
31	### err-utf8-encode
32
33	Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
34
35	But it may have a function to encode UTF-8 from a `List[Int]`. These errors
36	would be handled:
37
38	1. Integer greater than max code point
39	1. Code point in the surrogate range
40
41	### err-utf8-decode
42
43	A UTF-8 decoder should handle these errors:
44
45	1. Overlong encoding. In UTF-8, each code point should be represented with the
46	fewest possible bytes.
47	- Overlong encodings are the equivalent of writing the integer `42` as
48	`042`, `0042`, `00042`, etc. This is not allowed.
49	1. Surrogate code point. The sequence decodes to a code point in the surrogate
50	range, which is used only for the UTF-16 encoding, not for string data.
51	1. Exceeds max code point. The sequence decodes to an integer that's larger
52	than the maximum code point.
53	1. Bad encoding. A byte is not encoded like a UTF-8 start byte or a
54	continuation byte.
55	1. Incomplete sequence. Too few continuation bytes appeared after the start
56	byte.
57
58	## J8 String
59
60	J8 strings extend [JSON]($xref) strings, and are a primary building block of J8
61	Notation.
62
63	### err-j8-str-encode
64
65	J8 strings can represent any string — bytes or unicode — so there
66	are no encoding errors.
67
68	### err-j8-str-decode
69
70	1. Escape sequence like `\u{dc00}` should not be in the surrogate range.
71	- This means it doesn't represent a real character. Byte escapes like
72	`\yff` should be used instead.
73	1. Escape sequence like `\u{110000}` is greater than the maximimum Unicode code
74	point.
75	1. Byte escapes like `\yff` should not be in `u''` string.
76	- By design, they're only valid in `b''` strings.
77
78	Implementation-defined limit:
79
80	4. Max string length (NYI)
81	- e.g. more than 4 billion bytes could overflow a length field, in some
82	implementations
83
84	## J8 Lines
85
86	Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In
87	[YSH]($xref), it's used by `@(split command sub)`.
88
89	### err-j8-lines-encode
90
91	Like J8 strings, J8 Lines have no encoding errors by design.
92
93	### err-j8-lines-decode
94
95	1. Any error in a J8 quoted string.
96	- e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
97	1. A line with a quoted string has extra text after it.
98	- e.g. `"mystr" extra`.
99	1. An unquoted line is not valid UTF-8.
100
101	## JSON
102
103	### err-json-encode
104
105	JSON encoding has these errors:
106
107	1. Object of this type can't be serialized.
108	- For example, `Str List Dict` are Oils objects can be serialized, but
109	`Eggex Func Range` can't.
110	1. Circular reference.
111	- e.g. a Dict that points to itself, a List that points to itself, and other
112	permutations
113	1. Float values of NaN, Inf, and -Inf can't be encoded.
114	- TODO: option to use `null` like JavaScript.
115
116	Note that invalid UTF-8 bytes like `0xfe` produce a Unicode replacement
117	character, not a hard error.
118
119	### err-json-decode
120
121	1. The encoded message itself is not valid UTF-8.
122	- (Typically, you need to check the unescaped bytes in string literals
123	`"abc\n"`).
124	1. Lexical error, like
125	- the message `+`
126	- an invalid escape `"\z"` or a truncated escape `"\u1"`
127	- A single quoted string like `u''`
128	1. Grammatical error
129	- like the message `}{`
130	1. Unexpected trailing input
131	- like the message `42]` or `{}]`
132
133	Implementation-defined limits, i.e. outside the grammar:
134
135	5. Integer too big
136	- implementations may decode to a 64-bit integer
137	1. Floats that are too big
138	- may decode to `Inf`
139	1. Max array length (NYI)
140	- e.g. more than 4 billion objects in an array could overflow a length
141	field, in some implementations
142	1. Max object length (NYI)
143	1. Max depth for arrays and objects (NYI)
144	- to avoid a recursive parser blowing the stack
145
146	## JSON8
147
148	### err-json8-encode
149
150	JSON8 has the same encoding errors as JSON.
151
152	However, the encoding is lossless by design. Instead of invalid UTF-8 being
153	turned into a Unicode replacment character, it can use J8 strings with byte
154	escapes like `b'byte \yfe\yff'`.
155
156	### err-json8-decode
157
158	JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
159
160	See [err-j8-str-decode](#err-j8-str-decode).
161
162	<!--
163
164	## Packle
165
166	TODO: Not implemented!
167
168	### err-packle-encode
169
170	Packle has no encoding errors!
171
172	1. TODO: Unserializable `Eggex Func Range` can be turned into "wire Tuple"
173	`(type_name: Str, heap_id: Int)`.
174	- When you read a packle into Python, you'll get a tuple.
175	- When you read a packle back into YSH, you'll get a `value.Tombstone`?
176	1. Circular references are allowed. Packle data expresses a graph, not a
177	tree.
178	1. Float values NaN, Inf, and -Inf use their binary representations.
179	1. Both Unicode and binary data are allowed.
180
181	### err-packle-decode
182
183	TODO
184
185	-->
186