doc/ref/chap-errors.md

OILS / doc / ref / chap-errors.md View on Github | oilshell.org

167 lines, 115 significant

1	---
2	title: Errors (Oils Reference)
3	all_docs_url: ..
4	body_css_class: width40
5	default_highlighter: oils-sh
6	preserve_anchor_case: yes
7	---
8
9	<div class="doc-ref-header">
10
11	[Oils Reference](index.html) —
12	Chapter Errors
13
14	</div>
15
16	This chapter describes errors for data languages. An error checklist is
17	often a nice, concise way to describe a language.
18
19	Related: [Oils Error Catalog, With Hints](../error-catalog.html) describes
20	errors in code.
21
22	<span class="in-progress">(in progress)</span>
23
24	<div id="dense-toc">
25	</div>
26
27	## UTF8
28
29	J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
30
31	### err-utf8-encode
32
33	Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
34
35	But it may have a function to encode UTF-8 from a `List[Int]`. These errors
36	would be handled:
37
38	1. Integer greater than max code point
39	1. Code point in the surrogate range
40
41	### err-utf8-decode
42
43	A UTF-8 decoder should handle these errors:
44
45	1. Overlong encoding. In UTF-8, each code point should be represented with the
46	fewest possible bytes.
47	- Overlong encodings are the equivalent of writing the integer `42` as
48	`042`, `0042`, `00042`, etc. This is not allowed.
49	1. Surrogate code point. The sequence decodes to a code point in the surrogate
50	range, which is used only for the UTF-16 encoding, not for string data.
51	1. Exceeds max code point. The sequence decodes to an integer that's larger
52	than the maximum code point.
53	1. Bad encoding. A byte is not encoded like a UTF-8 start byte or a
54	continuation byte.
55	1. Incomplete sequence. Too few continuation bytes appeared after the start
56	byte.
57
58	## J8 String
59
60	J8 strings extend [JSON]($xref) strings, and are a primary building block of J8
61	Notation.
62
63	### err-j8-str-encode
64
65	J8 strings can represent any string — bytes or unicode — so there
66	are no encoding errors.
67
68	### err-j8-str-decode
69
70	1. Escape sequence like `\u{dc00}` should not be in the surrogate range.
71	- This means it doesn't represent a real character. Byte escapes like
72	`\yff` should be used instead.
73	1. Escape sequence like `\u{110000}` is greater than the maximimum Unicode code
74	point.
75	1. Byte escapes like `\yff` should not be in `u''` string.
76	- By design, they're only valid in `b''` strings.
77
78	## J8 Lines
79
80	Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In
81	[YSH]($xref), it's used by `@(split command sub)`.
82
83	### err-j8-lines-encode
84
85	Like J8 strings, J8 Lines have no encoding errors by design.
86
87	### err-j8-lines-decode
88
89	1. Any error in a J8 quoted string.
90	- e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
91	1. A line with a quoted string has extra text after it.
92	- e.g. `"mystr" extra`.
93	1. An unquoted line is not valid UTF-8.
94
95	## JSON
96
97	### err-json-encode
98
99	JSON encoding has these errors:
100
101	1. Object of this type can't be serialized.
102	- For example, `Str List Dict` are Oils objects can be serialized, but
103	`Eggex Func Range` can't.
104	1. Circular reference.
105	- e.g. a Dict that points to itself, a List that points to itself, and other
106	permutations
107	1. Float values of NaN, Inf, and -Inf can't be encoded.
108	- TODO: option to use `null` like JavaScript.
109
110	Note that invalid UTF-8 bytes like `0xfe` produce a Unicode replacement
111	character, not a hard error.
112
113	### err-json-decode
114
115	1. The encoded message itself is not valid UTF-8.
116	- (Typically, you need to check the unescaped bytes in string literals
117	`"abc\n"`).
118	1. Lexical error, like
119	- the message `+`
120	- an invalid escape `"\z"` or a truncated escape `"\u1"`
121	- A single quoted string like `u''`
122	1. Grammatical error
123	- like the message `}{`
124	1. Unexpected trailing input
125	- like the message `42]` or `{}]`
126
127	## JSON8
128
129	### err-json8-encode
130
131	JSON8 has the same encoding errors as JSON.
132
133	However, the encoding is lossless by design. Instead of invalid UTF-8 being
134	turned into a Unicode replacment character, it can use J8 strings with byte
135	escapes like `b'byte \yfe\yff'`.
136
137	### err-json8-decode
138
139	JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
140
141	See [err-j8-str-decode](#err-j8-str-decode).
142
143	<!--
144
145	## Packle
146
147	TODO: Not implemented!
148
149	### err-packle-encode
150
151	Packle has no encoding errors!
152
153	1. TODO: Unserializable `Eggex Func Range` can be turned into "wire Tuple"
154	`(type_name: Str, heap_id: Int)`.
155	- When you read a packle into Python, you'll get a tuple.
156	- When you read a packle back into YSH, you'll get a `value.Tombstone`?
157	1. Circular references are allowed. Packle data expresses a graph, not a
158	tree.
159	1. Float values NaN, Inf, and -Inf use their binary representations.
160	1. Both Unicode and binary data are allowed.
161
162	### err-packle-decode
163
164	TODO
165
166	-->
167