Oils Reference — Chapter Errors

This chapter describes errors for data languages. An error checklist is often a nice, concise way to describe a language.

Related: Oils Error Catalog, With Hints describes errors in code.

(in progress)

In This Chapter

UTF8

J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.

err-utf8-encode

Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.

But it may have a function to encode UTF-8 from a List[Int]. These errors would be handled:

  1. Integer greater than max code point
  2. Code point in the surrogate range

err-utf8-decode

A UTF-8 decoder should handle these errors:

  1. Overlong encoding. In UTF-8, each code point should be represented with the fewest possible bytes.
  2. Surrogate code point. The sequence decodes to a code point in the surrogate range, which is used only for the UTF-16 encoding, not for string data.
  3. Exceeds max code point. The sequence decodes to an integer that's larger than the maximum code point.
  4. Bad encoding. A byte is not encoded like a UTF-8 start byte or a continuation byte.
  5. Incomplete sequence. Too few continuation bytes appeared after the start byte.

J8 String

J8 strings extend JSON strings, and are a primary building block of J8 Notation.

err-j8-str-encode

J8 strings can represent any string — bytes or unicode — so there are no encoding errors.

err-j8-str-decode

  1. Escape sequence like \u{dc00} should not be in the surrogate range.
  2. Escape sequence like \u{110000} is greater than the maximimum Unicode code point.
  3. Byte escapes like \yff should not be in u'' string.

Implementation-defined limit:

  1. Max string length (NYI)

J8 Lines

Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In YSH, it's used by @(split command sub).

err-j8-lines-encode

Like J8 strings, J8 Lines have no encoding errors by design.

err-j8-lines-decode

  1. Any error in a J8 quoted string.
  2. A line with a quoted string has extra text after it.
  3. An unquoted line is not valid UTF-8.

JSON

err-json-encode

JSON encoding has these errors:

  1. Object of this type can't be serialized.
  2. Circular reference.
  3. Float values of NaN, Inf, and -Inf can't be encoded.

Note that invalid UTF-8 bytes like 0xfe produce a Unicode replacement character, not a hard error.

err-json-decode

  1. The encoded message itself is not valid UTF-8.
  2. Lexical error, like
  3. Grammatical error
  4. Unexpected trailing input

Implementation-defined limits, i.e. outside the grammar:

  1. Integer too big
  2. Floats that are too big
  3. Max array length (NYI)
  4. Max object length (NYI)
  5. Max depth for arrays and objects (NYI)

JSON8

err-json8-encode

JSON8 has the same encoding errors as JSON.

However, the encoding is lossless by design. Instead of invalid UTF-8 being turned into a Unicode replacment character, it can use J8 strings with byte escapes like b'byte \yfe\yff'.

err-json8-decode

JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.

See err-j8-str-decode.

Generated on Thu, 25 Jul 2024 23:38:53 +0000