|
      source |
      all docs
        for  |
      all versions |
      oilshell.org
      
      
This chapter describes errors for data languages.  An error checklist is
often a nice, concise way to describe a language.
Related: Oils Error Catalog, With Hints describes
errors in code.
(in progress)
UTF8
J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
err-utf8-encode
Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
But it may have a function to encode UTF-8 from a List[Int].  These errors
would be handled:
- Integer greater than max code point
 
- Code point in the surrogate range
 
err-utf8-decode
A UTF-8 decoder should handle these errors:
- Overlong encoding.  In UTF-8, each code point should be represented with the
fewest possible bytes.
- Overlong encodings are the equivalent of writing the integer 
42 as
042, 0042, 00042, etc.  This is not allowed. 
 
- Surrogate code point.  The sequence decodes to a code point in the surrogate
range, which is used only for the UTF-16 encoding, not for string data.
 
- Exceeds max code point.  The sequence decodes to an integer that's larger
than the maximum code point.
 
- Bad encoding.  A byte is not encoded like a UTF-8 start byte or a
continuation byte.
 
- Incomplete sequence.  Too few continuation bytes appeared after the start
byte.
 
J8 String
J8 strings extend JSON strings, and are a primary building block of J8
Notation.
err-j8-str-encode
J8 strings can represent any string — bytes or unicode — so there
are no encoding errors.
err-j8-str-decode
- Escape sequence like 
\u{dc00} should not be in the surrogate range.
- This means it doesn't represent a real character.  Byte escapes like
\yff should be used instead. 
 
- Escape sequence like 
\u{110000} is greater than the maximimum Unicode code
point. 
- Byte escapes like 
\yff should not be in u'' string.
- By design, they're only valid in 
b'' strings. 
 
Implementation-defined limit:
- Max string length (NYI)
- e.g. more than 4 billion bytes could overflow a length field, in some
implementations
 
 
J8 Lines
Roughly speaking, J8 Lines are an encoding for a stream of J8 strings.  In
YSH, it's used by @(split command sub).
err-j8-lines-encode
Like J8 strings, J8 Lines have no encoding errors by design.
err-j8-lines-decode
- Any error in a J8 quoted string.
- e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
 
 
- A line with a quoted string has extra text after it.
 
- An unquoted line is not valid UTF-8.
 
JSON
err-json-encode
JSON encoding has these errors:
- Object of this type can't be serialized.
- For example, 
Str List Dict are Oils objects can be serialized, but
Eggex Func Range can't. 
 
- Circular reference.
- e.g. a Dict that points to itself, a List that points to itself, and other
permutations
 
 
- Float values of NaN, Inf, and -Inf can't be encoded.
- TODO: option to use 
null like JavaScript. 
 
Note that invalid UTF-8 bytes like 0xfe produce a Unicode replacement
character, not a hard error.
err-json-decode
- The encoded message itself is not valid UTF-8.
- (Typically, you need to check the unescaped bytes in string literals
"abc\n"). 
 
- Lexical error, like
- the message 
+ 
- an invalid escape 
"\z" or a truncated escape "\u1" 
- A single quoted string like 
u'' 
 
- Grammatical error
 
- Unexpected trailing input
- like the message 
42] or {}] 
 
Implementation-defined limits, i.e. outside the grammar:
- Integer too big
- implementations may decode to a 64-bit integer
 
 
- Floats that are too big
 
- Max array length (NYI)
- e.g. more than 4 billion objects in an array could overflow a length
field, in some implementations
 
 
- Max object length (NYI)
 
- Max depth for arrays and objects (NYI)
- to avoid a recursive parser blowing the stack
 
 
JSON8
err-json8-encode
JSON8 has the same encoding errors as JSON.
However, the encoding is lossless by design.  Instead of invalid UTF-8 being
turned into a Unicode replacment character, it can use J8 strings with byte
escapes like b'byte \yfe\yff'.
err-json8-decode
JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
See err-j8-str-decode.
    
      Generated on Wed, 24 Jul 2024 05:19:10 +0000