| 1 | ---
|
| 2 | default_highlighter: oils-sh
|
| 3 | ---
|
| 4 |
|
| 5 | Up to 4 Pretty Printers?
|
| 6 | ========================
|
| 7 |
|
| 8 | *(March 2024)*
|
| 9 |
|
| 10 | Oils **parses** a lot of text, and it's becoming apparent than we need a
|
| 11 | **print** a lot too! The text should be nicely formatted because a shell is a
|
| 12 | user interface.
|
| 13 |
|
| 14 | This doc describes 4 possible pretty printers in Oils. Traditional shells
|
| 15 | don't have appear to have any pretty printing.
|
| 16 |
|
| 17 | [OSH]: $xref
|
| 18 | [YSH]: $xref
|
| 19 |
|
| 20 | <div id="toc">
|
| 21 | </div>
|
| 22 |
|
| 23 | ## Screenshots
|
| 24 |
|
| 25 | Let's be concrete first, because there's a lot of brainstorming below.
|
| 26 |
|
| 27 | [YSH]($xref) prints its JSON-like data structures with the `=` keyword, which
|
| 28 | takes a Python-like expression on the right:
|
| 29 |
|
| 30 | 
|
| 31 |
|
| 32 | Right now, it looks bad on big data structures. It should look something like
|
| 33 | `nodejs` or `jq`:
|
| 34 |
|
| 35 | 
|
| 36 |
|
| 37 | 
|
| 38 |
|
| 39 | We may want to omit the quotes, like `nodejs`. (This syntax isn't meant to be
|
| 40 | parsed. [JSON8]($xref) may have unquoted dict keys, although it's not
|
| 41 | essential).
|
| 42 |
|
| 43 | ## Background - `go fmt` style versus PPL style
|
| 44 |
|
| 45 | To back up a bit, I'm writing this doc organize my thoughts, and to explain
|
| 46 | problems and requirements to contributors.
|
| 47 |
|
| 48 | There are at least two schools of thought on pretty printers, which
|
| 49 | this lobste.rs thread has a good discussion of:
|
| 50 |
|
| 51 | - *Why is Prettier Rock Solid?* <https://lobste.rs/s/aevptj/why_is_prettier_rock_solid>
|
| 52 | - HN comments: <https://news.ycombinator.com/item?id=39437424>. Aside: a top
|
| 53 | comment shows why we don't want to take responsibility for *all* formatting
|
| 54 | decisions in our OSH-YSH printer! Users are opinionated. The JavaScript
|
| 55 | formatter Prettier is criticized for a bug, and for being **slow**.
|
| 56 |
|
| 57 | More comments on a blog post by Justin Pombrio (which I circulated):
|
| 58 |
|
| 59 | - *A Twist on Wadler's Printer*
|
| 60 | <https://lobste.rs/s/1r0aak/twist_on_wadler_s_printer>
|
| 61 |
|
| 62 | Let's call the two styles the "`go fmt` style" and the "PPL style" (functional
|
| 63 | pretty printing language).
|
| 64 |
|
| 65 | I was probably "biased" toward `go fmt`, because the two formatters we actually
|
| 66 | **use** in Oils are influenced by it:
|
| 67 |
|
| 68 | 1. `clang-format` for our C++ code.
|
| 69 | - This is the best formatter I've used. It's fast, and e.g. does a good job
|
| 70 | on C macros.
|
| 71 | 1. `yapf` for our Python code.
|
| 72 | - It's intentionally a "clone" for `clang-format`. (It's slow, mostly due
|
| 73 | to being written in Python, and creating lots of tiny objects!)
|
| 74 |
|
| 75 | (Details: they have line wrapping algorithms, while `go fmt` doesn't. I'm not
|
| 76 | calling them "graph search" printers, because I think of line wrapping as a
|
| 77 | **subproblem** of pretty printing.)
|
| 78 |
|
| 79 | <!--
|
| 80 |
|
| 81 | Related: a bunch of Zulip threads where I was learning about pretty printing:
|
| 82 |
|
| 83 | - TODO: Link to threads
|
| 84 |
|
| 85 | (I've written many parsers before, but only one ad hoc pretty printer, which I
|
| 86 | want to get rid of. Discussed below.)
|
| 87 |
|
| 88 | -->
|
| 89 |
|
| 90 | ### Why PPL style?
|
| 91 |
|
| 92 | However, Justin's post helped me understand Wadler's printer, a popular example
|
| 93 | of the PPL style. This style might have some advantages for Oils:
|
| 94 |
|
| 95 | - There's no user-provided layout for data structures — either
|
| 96 | [YSH]($xref) data or [Zephyr ASDL][ASDL] data. So we need to synthesize a
|
| 97 | layout from scratch.
|
| 98 | - We have multiple languages to format, and the PPL style separates **policy**
|
| 99 | (language rules) and **mechanism** (line wrapping). So we should try a more
|
| 100 | principled architecture, hopefully without sacrificing quality.
|
| 101 | - The two styles may not be as distinct as they seem at first. They may be
|
| 102 | complementary.
|
| 103 | - We can probably use a PPL for the **expression subproblem** of the OSH-YSH
|
| 104 | shell formatter. The rest of the formatter will have rules that **don't**
|
| 105 | have to do with line wrapping (aligning EOL comments like `go fmt` does,
|
| 106 | etc.)
|
| 107 | - I think a **non-wrapping** pretty printer — an "indenter" — can
|
| 108 | use something similar to the PPL IRs. Notes below.
|
| 109 | - PPLs *could be* slower (asymptotically and in practice) than the custom
|
| 110 | algorithms, but I think that can be solved with a simple rule in practice:
|
| 111 | 1. Compute the total size of the data structure/doc up front.
|
| 112 | 1. If it's small, spend extra time to make it look pretty, by using an
|
| 113 | **expressive and slow** PPL. We can be quadratic or worse, perhaps. We
|
| 114 | might want `node.js`-like columnar layout.
|
| 115 | 1. If it's big, use a **simple and fast** PPL subset.
|
| 116 |
|
| 117 | (Does that last idea work?)
|
| 118 |
|
| 119 | [ASDL]: $xref:zephyr-asdl
|
| 120 |
|
| 121 |
|
| 122 | ### Warning
|
| 123 |
|
| 124 | Sometimes I pile on too many requirements, which I mentioned in the latest
|
| 125 | release announcement:
|
| 126 |
|
| 127 | - [Oils 0.20.0 > Why I'm Working on JSON](https://www.oilshell.org/blog/2024/02/release-0.20.0.html#zulip-why-am-i-working-on-json)
|
| 128 |
|
| 129 | We should do the simplest things first, and I think the PPL approach will allow
|
| 130 | that.
|
| 131 |
|
| 132 | BTW there are many threads on `#data-languages` on [Zulip]($xref) where I'm
|
| 133 | brainstorming and learning about pretty printing.
|
| 134 |
|
| 135 | ## Four Printers - What and Why?
|
| 136 |
|
| 137 | Here's a sketch of what I think we need. It goes from **concrete** →
|
| 138 | **experimental** and research-y.
|
| 139 |
|
| 140 | ### Print YSH data types in a JSON-like format
|
| 141 |
|
| 142 | **What is it?** This is the `=` keyword shown in the screenshots above. (BTW,
|
| 143 | Lua uses the same syntax `=` to evaluate expressions.)
|
| 144 |
|
| 145 | **Motivation**: We should look as good as `node.js` or `jq`.
|
| 146 |
|
| 147 | ---
|
| 148 |
|
| 149 | Currently we use our JSON printer with the options `SHOW_NON_DATA |
|
| 150 | SHOW_CYCLES`.
|
| 151 |
|
| 152 | - `SHOW_NOW_DATA` prints non-data objects, like `<Func 0x123>`. This syntax
|
| 153 | can't be parsed.
|
| 154 | - `SHOW_CYCLES` prints cycles with `-->`, instead of raising an error, like
|
| 155 | JSON does.
|
| 156 |
|
| 157 | Example:
|
| 158 |
|
| 159 | ysh$ var d = {}
|
| 160 | ysh$ setvar d.eggex = /dot+/ # Eggex object
|
| 161 |
|
| 162 | ysh$ = d
|
| 163 | (Dict 0x7feb87cb4050) {"eggex":<Eggex 0x7feb87dbfd00>}
|
| 164 |
|
| 165 | ysh$ setvar d.cycle = d
|
| 166 |
|
| 167 | ysh$ = d
|
| 168 | (Dict 0x7feb87cb4050) {"eggex":<Eggex 0x7feb87dbfd00>,"cycle":{ --> 0x7feb87cb4050 }}
|
| 169 |
|
| 170 | We should **replace** this with a new pretty printer that wraps lines, and has
|
| 171 | <span style="color: darkcyan; font-weight: bold">COLOR</span>.
|
| 172 |
|
| 173 | ### Replace our ad hoc Zephyr ASDL pretty printer
|
| 174 |
|
| 175 | **What is it?** [Zephyr ASDL][ASDL] is the statically typed schema language we
|
| 176 | use to implement Oils. It's "one level down" from the shell.
|
| 177 |
|
| 178 | We used it to define the syntax of shell with **algebraic data types**. We
|
| 179 | create a [lossless syntax tree]($xref:LST), which is also an **IR** for shell.
|
| 180 |
|
| 181 | **Motivation**: We already wrote an ad hoc pretty printer, and it should be
|
| 182 | replaced! It tries to fit records on a single line, and if that fails, it uses
|
| 183 | multiple lines. I think it's slow.
|
| 184 |
|
| 185 | If we already have a YSH data printer, this printer should "obviously" be
|
| 186 | unified with it. We should have a nice separation of policy and mechanism.
|
| 187 |
|
| 188 | ---
|
| 189 |
|
| 190 | Use `osh -n myscript.sh` to see what it does:
|
| 191 |
|
| 192 | 
|
| 193 |
|
| 194 | Notes:
|
| 195 |
|
| 196 | - The algorithm is in [asdl/format.py]($oils-src), and the "homogeneous IR" is
|
| 197 | in [asdl/hnode.asdl]($oils-src).
|
| 198 | - Each generated class definition has a `PrettyTree()` method, which converts
|
| 199 | the **typed** `self` or `this` to the homogeneous `hnode` tree.
|
| 200 | - `AbbreviatedTree()` is a bit like the **modular** printers discussed in the
|
| 201 | `lobste.rs` thread. It makes certain common data structures more
|
| 202 | readable, with type-specific logic. It's in Python only — can that
|
| 203 | logic also be in C++?
|
| 204 | - The syntax tree is actually a **graph**, and I recently added logic to
|
| 205 | **omit duplicate** nodes. This is unlike the JSON printer, which prints
|
| 206 | duplicate nodes, as Python and node.js do.
|
| 207 | - The slowness hasn't really mattered, because this format isn't exposed to
|
| 208 | users. It's only for **debugging** Oils itself. But it's useful, and we may
|
| 209 | want to expose it.
|
| 210 | - Also used in `pp asdl (obj)`, another debugging feature.
|
| 211 | - TODO: Add a simple benchmark? The new printer will probably be faster than
|
| 212 | the old one.
|
| 213 | - `osh -n benchmarks/testdata/configure-coreutils` tests a huge shell file
|
| 214 |
|
| 215 | <!--
|
| 216 | - current state of `osh -n`
|
| 217 | - timing of it -- I think this may take awhile. It was never optimized. It
|
| 218 | produces MANY allocations.
|
| 219 | - to be honest -- allocations and GC will probably **dominate** in Oils.
|
| 220 | -->
|
| 221 |
|
| 222 | ### OSH-YSH Code Formatter
|
| 223 |
|
| 224 | **What is it?** A formatter for shell code. I think it's more essential to
|
| 225 | have a [YSH]($xref) formatter, but an [OSH]($xref) formatter is also possible.
|
| 226 | They both use the same [lossless syntax tree]($xref:LST).
|
| 227 |
|
| 228 | **Motivation** - Formatters make a new language easier to use, and there are
|
| 229 | many users who don't know shell well.
|
| 230 |
|
| 231 | For example, I don't know TypeScript well, and I had a good experience with
|
| 232 | `deno fmt`. It reduced the **mental load** of adopting a new tool.
|
| 233 |
|
| 234 | ---
|
| 235 |
|
| 236 | Justin had a nice idea on on `lobste.rs` - we should create **coarse tree**
|
| 237 | with these elements:
|
| 238 |
|
| 239 | - `{ }` affect indentation in [YSH]($xref)
|
| 240 | - In [OSH]($xref), we should also understand `then elif else fi`, `do done`,
|
| 241 | etc.
|
| 242 | - `( )` in [YSH]($xref) changes the lexer from [command mode to expression
|
| 243 | mode](command-vs-expression-mode.html)
|
| 244 | - **Newlines** can't appear in atomic `text()` / chunks
|
| 245 | - Comments need to be preserved at the end of lines
|
| 246 | - They may also be aligned on consecutive lines (with heuristics)
|
| 247 | - Keywords like `while for if` begin blocks of code
|
| 248 |
|
| 249 | Why? We don't don't want to take responsibility for every formatting decision!
|
| 250 |
|
| 251 | I actually think the command mode / statement formatter should be
|
| 252 | **non-wrapping**, while expressions can wrap. Commands will likely be more
|
| 253 | common than expressions in most YSH code.
|
| 254 |
|
| 255 | ---
|
| 256 |
|
| 257 | The formatter will be invoked with `ysh --tool format myfile.ysh`.
|
| 258 |
|
| 259 | It can also be used with the output of `osh --tool ysh-ify`, which roughly
|
| 260 | translates [OSH]($xref) to [YSH]($xref) (doesn't preserve semantics). This may
|
| 261 | help generate **test data**, since there's plenty of shell code in the wild,
|
| 262 | but not much [YSH][] code.
|
| 263 |
|
| 264 | ### Experimental: Export the Oils "Syntax Graph" to Users with "NIL8"
|
| 265 |
|
| 266 | **What is it?** A more human AND machine-readable format for the syntax tree,
|
| 267 | which is actually a **graph**.
|
| 268 |
|
| 269 | **Motivation**: The pretty-printed AST could be a **parseable** format. Allow
|
| 270 | users to reuse all the hard work we did [parsing
|
| 271 | shell]($blog-tag:parsing-shell)!
|
| 272 |
|
| 273 | ---
|
| 274 |
|
| 275 | Printing raw [ASDL][] data is useful, but it could be more readable with custom
|
| 276 | logic to print the natural **layers** of the graph. There are 4 logical layers
|
| 277 | in [frontend/syntax.asdl]($oils-src):
|
| 278 |
|
| 279 | 1. `source_t` describes whether shell code comes from `foo.sh` or `ysh -c 'echo
|
| 280 | mycode'`
|
| 281 | 2. `SourceLine` represents physical lines of code
|
| 282 | 3. `Token` refers to portions of lines
|
| 283 | 4. The syntax tree of `command_t word_t word_part_t expr_t`. The leaves are
|
| 284 | tokens.
|
| 285 |
|
| 286 | And instead of a pretty format meant for humans, we may want to print an
|
| 287 | s-expression-like format I'm calling **"NIL8"**.
|
| 288 |
|
| 289 | NIL8 can be parsed. You may want to write tree-shaking code to deploy
|
| 290 | [YSH][] into containers.
|
| 291 |
|
| 292 | More on NIL8 below. Again, it's experimental.
|
| 293 |
|
| 294 | ## Implementation Notes
|
| 295 |
|
| 296 | ### Do the printers depend on each other?
|
| 297 |
|
| 298 | - The [YSH][] printer (1) naturally comes before the [ASDL][] printer (2).
|
| 299 | - The code formatter (3) is concrete and useful to end users.
|
| 300 | - The NIL8 printer (4) comes after the [ASDL][] printer (2), but it's experimental.
|
| 301 | - It depends on a bunch of other work in Oils/YSH.
|
| 302 |
|
| 303 | Risks:
|
| 304 |
|
| 305 | - No real risks for (1) and (2)
|
| 306 | - They're "engineering" — Justin's blog post is very close! It could
|
| 307 | be ported almost literally to typed Python. It will translate
|
| 308 | automatically to C++. (And it would be interesting to compare our
|
| 309 | garbage-collected C++ with Rust's `Rc<T>`)
|
| 310 | - [ASDL][] involves code generation in both Python and C++. We have a custom
|
| 311 | build system for this (using Ninja for C++).
|
| 312 | - The OSH/YSH formatter has non-trivial decisions
|
| 313 | - End-of-line comments. (Shell doesn't have block comments, which simplifies
|
| 314 | things.)
|
| 315 | - Multi-line strings in YSH have a special rule -- the indentation of the
|
| 316 | closing `'''` is significant
|
| 317 | - Shell here docs may be tricky
|
| 318 | - This formatter probably requires the most "elbow grease". This is why I
|
| 319 | said that the statement formatter should initially be **non-wrapping** —
|
| 320 | it reduces the scope of the problem.
|
| 321 |
|
| 322 | ### Code Skeleton
|
| 323 |
|
| 324 | I added some stubs in the code:
|
| 325 |
|
| 326 | - [data_lang/pretty.asdl]($oils-src) - How we would express the IR
|
| 327 | - [data_lang/pretty.py]($oils-src) - YSH conversion.
|
| 328 | - [data_lang/pretty-benchmark.sh]($oils-src) - Our naive ASDL pretty printer is
|
| 329 | slow. It can take more than 3 seconds on a big file, vs. ~100ms to parse it.
|
| 330 | (It does print over 100 MB of text though.)
|
| 331 |
|
| 332 | To generate Python code from the ASDL schema, run `build/py.sh all`.
|
| 333 | Otherwise, Oils is a plain Python 2 program, with a few C extensions.
|
| 334 |
|
| 335 | C++ translation is a separate step, and it's now pretty polished.
|
| 336 |
|
| 337 | For new contributors:
|
| 338 |
|
| 339 | - [Contributing]($wiki) on the wiki
|
| 340 | - [Where Contributors Have Problems]($wiki)
|
| 341 |
|
| 342 | There is also a stub for the formatter:
|
| 343 |
|
| 344 | - [tools/fmt.py]($oils-src) - Stub file for the formatter.
|
| 345 | - Code copied from [tools/ysh_ify.py]($oils-src).
|
| 346 |
|
| 347 | ## Design Questions
|
| 348 |
|
| 349 | This section has some less concrete thoughts.
|
| 350 |
|
| 351 | ### Non-Wrapping Printers aka "Indenters" - same PPL IR?
|
| 352 |
|
| 353 | I think the PPL IRs are also useful if you're not line wrapping? You can just
|
| 354 | fix indentation.
|
| 355 |
|
| 356 | ### Columnar Layouts (spending more time)
|
| 357 |
|
| 358 | `nodejs` does this in its `console.log()`.
|
| 359 |
|
| 360 | 
|
| 361 |
|
| 362 | Future work? We should get the basic pretty printer working first.
|
| 363 |
|
| 364 | ### Two Levels of Coarse Parsing for YSH? (not at first)
|
| 365 |
|
| 366 | Making the coarse tree has some similarity to syntax highlighting. I wrote a
|
| 367 | simple syntax highlighter for 5+ languages called `micro-syntax`, and it should
|
| 368 | support [YSH]($xref) too.
|
| 369 |
|
| 370 | Sketch:
|
| 371 |
|
| 372 | 1. First make the **really coarse tree**, something like: `Comment | Code |
|
| 373 | StringLiteral`
|
| 374 | 2. Then make a **less coarse** tree for pretty printing:
|
| 375 | - Lex code into `{} ()`
|
| 376 | - Categorize comments into `EndLineComment` | `BeginLineComment`
|
| 377 |
|
| 378 | Then do a trivial linear pass to fix up indentation. The `{ }` or `do done`
|
| 379 | tokens determine indentation.
|
| 380 |
|
| 381 | ---
|
| 382 |
|
| 383 | Though honestly it's probably better to just **reuse** our elaborate parser at
|
| 384 | first. I like to "compress" different algorithms together, but it may not be
|
| 385 | worth it here.
|
| 386 |
|
| 387 | ### NIL8 - Uses cases for both Code and Data
|
| 388 |
|
| 389 | What is "NIL8"? We don't know if it's a good idea yet, but it may be part of
|
| 390 | [J8 Notation](j8-notation.html).
|
| 391 |
|
| 392 | Think:
|
| 393 |
|
| 394 | - A mash-up of [JSON][] and S-expressions
|
| 395 | - *NIL8 Isn't Lisp*
|
| 396 | - *Narrow Intermediate Language*
|
| 397 | - WebAssembly text format
|
| 398 | - An IR for an **imperative** language, with a Lisp-y syntax.
|
| 399 | - An **exterior** S-expression format
|
| 400 | - Blog: [Oils is Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html)
|
| 401 | - I posted POSE (portable s-expressions) on lobste.rs for this reason:
|
| 402 | <https://lobste.rs/s/lwf4jv/pose_portable_s_expressions_pose_spec> (no
|
| 403 | comments)
|
| 404 |
|
| 405 | [JSON]: $xref
|
| 406 |
|
| 407 | If NIL8 can represent both the [lossless syntax tree]($xref:LST) and a new IR
|
| 408 | for a [mycpp]($xref) rewrite ("yaks"), that's a good test of the design.
|
| 409 |
|
| 410 | Note that the AST is a **statically typed** data structure, which means we may
|
| 411 | also want to export the [ASDL][] **schema** as NIL8!
|
| 412 |
|
| 413 | Links:
|
| 414 |
|
| 415 | - [#data-languages > NIL8 Infix Rule](https://oilshell.zulipchat.com/#narrow/stream/403584-data-languages/topic/NIL8.20Infix.20Rule)
|
| 416 | - [Commit describing infix rule](https://github.com/oilshell/oil/commit/b9cecf88838d4c89ce1dbd8f4bcdd8e92e10d442)
|
| 417 |
|
| 418 | At a high level, we're trying to nudge users toward a **small** set of syntaxes
|
| 419 | for shell-like programming, rather than inventing ad hoc syntax every time.
|
| 420 | String literals are a pain point: people often implement them badly, or not at
|
| 421 | all.
|
| 422 |
|
| 423 | ## Conclusion
|
| 424 |
|
| 425 | I think we should have shared infrastructure for 3 printers:
|
| 426 |
|
| 427 | 1. [YSH]($xref) data structures
|
| 428 | 1. [ASDL][] data structures
|
| 429 | 1. [OSH]($xref) and [YSH]($xref) code
|
| 430 |
|
| 431 | And then there's this idea of "replacing" or rationalizing the [ASDL][] syntax
|
| 432 | tree with "NIL8":
|
| 433 |
|
| 434 | - It can be parsed, not just printed.
|
| 435 | - To parse, you can reuse an existing [JSON]($xref) string lexer. IMO, this
|
| 436 | is the fiddliest part of parsing.
|
| 437 | - It can export a graph shape, in natural "layers".
|
| 438 |
|
| 439 | ## Related
|
| 440 |
|
| 441 | ### Docs
|
| 442 |
|
| 443 | - [Parser Architecture](parser-architecture.html) - describes issues like the
|
| 444 | **"lossless invariant"**, which is affected by *re-parsing*.
|
| 445 | - I recently updated it, and tested the invariant with
|
| 446 | [test/lossless.sh]($oils-src).
|
| 447 | - The repo [README]($oils-doc:oils-repo/README.html) has an overview of the
|
| 448 | code.
|
| 449 |
|
| 450 | ### Fun Computer Science Problems in Oils
|
| 451 |
|
| 452 | Pretty printing is adjacent to other fun problems in Oils, like GC performance,
|
| 453 | "boxless" optimization, etc.
|
| 454 |
|
| 455 | - [#help-wanted > Fun Computer Science Problems](https://oilshell.zulipchat.com/#narrow/stream/417617-help-wanted/topic/Fun.20Computer.20Science.20Problems)
|
| 456 |
|
| 457 | Things to think about:
|
| 458 |
|
| 459 | (1) Unified Code Representation for Oils
|
| 460 |
|
| 461 | We want to pack all these tools into Oils:
|
| 462 |
|
| 463 | - The interpreter, for **execution**
|
| 464 | - prints precise errors, ignores comment tokens
|
| 465 | - `ysh-ify` - a VERY rough **translation** of [OSH]($xref) to [YSH]($xref)
|
| 466 | - doesn't respect semantics, because you really need static types for that
|
| 467 | - uses "span ID"
|
| 468 | - **Pretty Printing** (this doc)
|
| 469 | - will also use "span ID" to detect comment positions, I think
|
| 470 | - Syntax **highlighting**
|
| 471 | - Interactive highlighting will help users learn the language
|
| 472 | - It's **recursive**, because sublanguages are mutually recursive: string
|
| 473 | literals, commands, expressions
|
| 474 |
|
| 475 | Related: Retrospective on the Go `ast` package.
|
| 476 |
|
| 477 | (2) Yaks - *mycpp from the bottom up*
|
| 478 |
|
| 479 | NIL8 leads into **"Yaks"**, which is an IR for garbage collected C++.
|
| 480 |
|
| 481 | - Yaks is of course a big yak shave, but it's **concrete** because we recently
|
| 482 | **completed** the translation with [mycpp]($xref). (mycpp is a crappy
|
| 483 | program which produces good results!)
|
| 484 |
|
| 485 | (3) Pretty printing will cause many small **allocations**.
|
| 486 |
|
| 487 | I think that naive implementations should be fast enough. If not, any slowness
|
| 488 | may be due to allocation, not necessarily the pretty printing algorithm itself.
|
| 489 |
|
| 490 | Some solutions:
|
| 491 |
|
| 492 | - We want to move towards a **tagged pointer** runtime, an ambitious
|
| 493 | transformation of the entire interpreter.
|
| 494 | - But we'll do it in steps. First step: Small String Optimization.
|
| 495 | - Yaks goes along with a tagged pointer runtime. Yaks also has a principled
|
| 496 | IR, which may be represented with many small objects.
|
| 497 | - GC rooting optimization should give a speed boost
|
| 498 |
|
| 499 | ## Appendix: Older Notes
|
| 500 |
|
| 501 | This is about ref counting for printing **graph-shaped** data.
|
| 502 |
|
| 503 | I no longer think this is as important. I think we should **manually**
|
| 504 | construct 4 layers of the graph, as described in the section on formatter (4).
|
| 505 |
|
| 506 | ### Dynamically Typed YSH Data (`value_t`)
|
| 507 |
|
| 508 | Similar to JSON / JSON8 printing, except we
|
| 509 |
|
| 510 | 1. count references, and then print `...` instead of repeating
|
| 511 | 1. line wrap
|
| 512 | 1. assign colors
|
| 513 | - for atoms, and possibly for balanced parens, to make it more readable
|
| 514 |
|
| 515 | #### Step 1: Count References
|
| 516 |
|
| 517 | This is a global pass that computes a Dict[int, int]
|
| 518 |
|
| 519 | object ID -> number of times referenced in the graph
|
| 520 |
|
| 521 | The graph is specified by single root node, e.g. the argument to
|
| 522 |
|
| 523 | pp line (obj)
|
| 524 |
|
| 525 | Pass this dict into the second step.
|
| 526 |
|
| 527 | #### Step 2: Convert To Homogeneous Representation
|
| 528 |
|
| 529 | value.List -> hnode.Compound with []
|
| 530 | value.Dict -> hnode.Compound with {}
|
| 531 |
|
| 532 | null, true, false -> Atom
|
| 533 | Cycle detected -> Atom, with { --> 4beef2 }
|
| 534 | [ --> 4beef2 ]
|
| 535 |
|
| 536 | Repetition:
|
| 537 |
|
| 538 | { key: { ... 4beef2 }, key2: { ... 4beef2 }
|
| 539 |
|
| 540 | Or maybe omit the type, since strings don't have that:
|
| 541 |
|
| 542 | { key: ... 4beef2, key2: ... 4beef2 }
|
| 543 |
|
| 544 | #### hnode Schema
|
| 545 |
|
| 546 | The schema looks like this now?
|
| 547 |
|
| 548 | hnode =
|
| 549 | Atom(str s, color color) - # External objects can use this?
|
| 550 | | Compound(hnode* items)
|
| 551 |
|
| 552 | The length of 'str s' is the input to line wrapping.
|
| 553 |
|
| 554 | #### Step 3: Figure out what's on each line
|
| 555 |
|
| 556 | TODO: what's the heuristic here? Is it global?
|
| 557 |
|
| 558 | Dynamic programming?
|
| 559 |
|
| 560 | do we insert hnode.Newline() or something?
|
| 561 |
|
| 562 | ### Statically Typed ASDL Data
|
| 563 |
|
| 564 | Reduce it to the case above.
|
| 565 |
|
| 566 | #### Step 1 - Ref Counting / Cycle Detection?
|
| 567 |
|
| 568 | We do this all at once?
|
| 569 |
|
| 570 | Because to convert to value.Record, you have to do cycle detection anyway.
|
| 571 |
|
| 572 | And that's similar to ref counting.
|
| 573 |
|
| 574 | #### Step 2 - ASDL records -> value.Record
|
| 575 |
|
| 576 | value =
|
| 577 | ...
|
| 578 | | Record(str type_name, Dict[str, value_t] fields)
|
| 579 |
|
| 580 | The special "-" key can be used for JSON:
|
| 581 |
|
| 582 | {"-": "command.Simple, "name": "hi"}
|
| 583 |
|
| 584 | Though this loses some information, and it doesn't solve the problem with
|
| 585 | shared references. We would need Packle for that.
|
| 586 |
|
| 587 | #### Step 2a: Optional Abbreviation?
|
| 588 |
|
| 589 | Is this separate? Or part of step 2.
|
| 590 |
|
| 591 | We need something between value.Record and hnode.Compound
|
| 592 | to do ABBREVIATION:
|
| 593 |
|
| 594 | - Abbreviate type name, or omit it
|
| 595 | - Omit some field names (requires schema to record it)
|
| 596 | - Change () to <>
|
| 597 |
|
| 598 | Also need nodes for
|
| 599 |
|
| 600 | - `...` means already printed
|
| 601 | - `---` means CANNOT print, because it's a cycle
|
| 602 | - `@1f23` - ID if already printed, or in a cycle
|
| 603 |
|
| 604 | #### Steps 3 and 4 - Homogeneous Representation, Line Wrapping
|
| 605 |
|
| 606 | Identical to the dynamically typed case above.
|
| 607 |
|