| 1 | ---
 | 
| 2 | default_highlighter: oils-sh
 | 
| 3 | ---
 | 
| 4 | 
 | 
| 5 | Up to 4 Pretty Printers?
 | 
| 6 | ========================
 | 
| 7 | 
 | 
| 8 | *(March 2024)*
 | 
| 9 | 
 | 
| 10 | Oils **parses** a lot of text, and it's becoming apparent than we need a
 | 
| 11 | **print** a lot too!  The text should be nicely formatted because a shell is a
 | 
| 12 | user interface.
 | 
| 13 | 
 | 
| 14 | This doc describes 4 possible pretty printers in Oils.  Traditional shells
 | 
| 15 | don't have appear to have any pretty printing.
 | 
| 16 | 
 | 
| 17 | [OSH]: $xref
 | 
| 18 | [YSH]: $xref
 | 
| 19 | 
 | 
| 20 | <div id="toc">
 | 
| 21 | </div> 
 | 
| 22 | 
 | 
| 23 | ## Screenshots
 | 
| 24 | 
 | 
| 25 | Let's be concrete first, because there's a lot of brainstorming below.
 | 
| 26 | 
 | 
| 27 | [YSH]($xref) prints its JSON-like data structures with the `=` keyword, which
 | 
| 28 | takes a Python-like expression on the right:
 | 
| 29 | 
 | 
| 30 | 
 | 
| 31 | 
 | 
| 32 | Right now, it looks bad on big data structures.  It should look something like
 | 
| 33 | `nodejs` or `jq`:
 | 
| 34 | 
 | 
| 35 | 
 | 
| 36 | 
 | 
| 37 | 
 | 
| 38 | 
 | 
| 39 | We may want to omit the quotes, like `nodejs`.  (This syntax isn't meant to be
 | 
| 40 | parsed.  [JSON8]($xref) may have unquoted dict keys, although it's not
 | 
| 41 | essential).
 | 
| 42 | 
 | 
| 43 | ## Background - `go fmt` style versus PPL style
 | 
| 44 | 
 | 
| 45 | To back up a bit, I'm writing this doc organize my thoughts, and to explain
 | 
| 46 | problems and requirements to contributors.
 | 
| 47 | 
 | 
| 48 | There are at least two schools of thought on pretty printers, which 
 | 
| 49 | this lobste.rs thread has a good discussion of:
 | 
| 50 | 
 | 
| 51 | - *Why is Prettier Rock Solid?* <https://lobste.rs/s/aevptj/why_is_prettier_rock_solid>
 | 
| 52 |   - HN comments: <https://news.ycombinator.com/item?id=39437424>.  Aside: a top
 | 
| 53 |     comment shows why we don't want to take responsibility for *all* formatting
 | 
| 54 |     decisions in our OSH-YSH printer!   Users are opinionated.  The JavaScript
 | 
| 55 |     formatter Prettier is criticized for a bug, and for being **slow**.
 | 
| 56 | 
 | 
| 57 | More comments on a blog post by Justin Pombrio (which I circulated):
 | 
| 58 | 
 | 
| 59 | - *A Twist on Wadler's Printer*
 | 
| 60 |   <https://lobste.rs/s/1r0aak/twist_on_wadler_s_printer>
 | 
| 61 | 
 | 
| 62 | Let's call the two styles the "`go fmt` style" and the "PPL style" (functional
 | 
| 63 | pretty printing language).
 | 
| 64 | 
 | 
| 65 | I was probably "biased" toward `go fmt`, because the two formatters we actually
 | 
| 66 | **use** in Oils are influenced by it:
 | 
| 67 | 
 | 
| 68 | 1. `clang-format` for our C++ code.
 | 
| 69 |    - This is the best formatter I've used.  It's fast, and e.g. does a good job
 | 
| 70 |      on C macros.
 | 
| 71 | 1. `yapf` for our Python code.
 | 
| 72 |    - It's intentionally a "clone" for `clang-format`.  (It's slow, mostly due
 | 
| 73 |      to being written in Python, and creating lots of tiny objects!)
 | 
| 74 | 
 | 
| 75 | (Details: they have line wrapping algorithms, while `go fmt` doesn't.  I'm not
 | 
| 76 | calling them "graph search" printers, because I think of line wrapping as a
 | 
| 77 | **subproblem** of pretty printing.)
 | 
| 78 | 
 | 
| 79 | <!--
 | 
| 80 | 
 | 
| 81 | Related: a bunch of Zulip threads where I was learning about pretty printing:
 | 
| 82 | 
 | 
| 83 | - TODO: Link to threads
 | 
| 84 | 
 | 
| 85 | (I've written many parsers before, but only one ad hoc pretty printer, which I
 | 
| 86 | want to get rid of.  Discussed below.)
 | 
| 87 | 
 | 
| 88 | -->
 | 
| 89 | 
 | 
| 90 | ### Why PPL style?
 | 
| 91 | 
 | 
| 92 | However, Justin's post helped me understand Wadler's printer, a popular example
 | 
| 93 | of the PPL style.  This style might have some advantages for Oils:
 | 
| 94 | 
 | 
| 95 | - There's no user-provided layout for data structures — either
 | 
| 96 |   [YSH]($xref) data or [Zephyr ASDL][ASDL] data.  So we need to synthesize a
 | 
| 97 |   layout from scratch.
 | 
| 98 | - We have multiple languages to format, and the PPL style separates **policy**
 | 
| 99 |   (language rules) and **mechanism** (line wrapping).  So we should try a more
 | 
| 100 |   principled architecture, hopefully without sacrificing quality.
 | 
| 101 | - The two styles may not be as distinct as they seem at first.  They may be
 | 
| 102 |   complementary.
 | 
| 103 |   - We can probably use a PPL for the **expression subproblem** of the OSH-YSH
 | 
| 104 |     shell formatter.  The rest of the formatter will have rules that **don't**
 | 
| 105 |     have to do with line wrapping (aligning EOL comments like `go fmt` does,
 | 
| 106 |     etc.)
 | 
| 107 |   - I think a **non-wrapping** pretty printer — an "indenter" — can
 | 
| 108 |     use something similar to the PPL IRs.  Notes below.
 | 
| 109 | - PPLs *could be* slower (asymptotically and in practice) than the custom
 | 
| 110 |   algorithms, but I think that can be solved with a simple rule in practice:
 | 
| 111 |   1. Compute the total size of the data structure/doc up front.
 | 
| 112 |   1. If it's small, spend extra time to make it look pretty, by using an
 | 
| 113 |      **expressive and slow** PPL.  We can be quadratic or worse, perhaps.  We
 | 
| 114 |      might want `node.js`-like columnar layout.
 | 
| 115 |   1. If it's big, use a **simple and fast** PPL subset.
 | 
| 116 | 
 | 
| 117 | (Does that last idea work?)
 | 
| 118 | 
 | 
| 119 | [ASDL]: $xref:zephyr-asdl
 | 
| 120 | 
 | 
| 121 | 
 | 
| 122 | ### Warning
 | 
| 123 | 
 | 
| 124 | Sometimes I pile on too many requirements, which I mentioned in the latest
 | 
| 125 | release announcement:
 | 
| 126 | 
 | 
| 127 | - [Oils 0.20.0 > Why I'm Working on JSON](https://www.oilshell.org/blog/2024/02/release-0.20.0.html#zulip-why-am-i-working-on-json)
 | 
| 128 | 
 | 
| 129 | We should do the simplest things first, and I think the PPL approach will allow
 | 
| 130 | that.
 | 
| 131 | 
 | 
| 132 | BTW there are many threads on `#data-languages` on [Zulip]($xref) where I'm
 | 
| 133 | brainstorming and learning about pretty printing.
 | 
| 134 | 
 | 
| 135 | ## Four Printers - What and Why?
 | 
| 136 | 
 | 
| 137 | Here's a sketch of what I think we need.  It goes from **concrete** →
 | 
| 138 | **experimental** and research-y.
 | 
| 139 | 
 | 
| 140 | ### Print YSH data types in a JSON-like format
 | 
| 141 | 
 | 
| 142 | **What is it?**  This is the `=` keyword shown in the screenshots above.  (BTW,
 | 
| 143 | Lua uses the same syntax `=` to evaluate expressions.)
 | 
| 144 | 
 | 
| 145 | **Motivation**: We should look as good as `node.js` or `jq`.
 | 
| 146 | 
 | 
| 147 | ---
 | 
| 148 | 
 | 
| 149 | Currently we use our JSON printer with the options `SHOW_NON_DATA |
 | 
| 150 | SHOW_CYCLES`.
 | 
| 151 | 
 | 
| 152 | - `SHOW_NOW_DATA` prints non-data objects, like `<Func 0x123>`.  This syntax
 | 
| 153 |   can't be parsed.
 | 
| 154 | - `SHOW_CYCLES` prints cycles with `-->`, instead of raising an error, like
 | 
| 155 |   JSON does.
 | 
| 156 | 
 | 
| 157 | Example:
 | 
| 158 | 
 | 
| 159 |     ysh$ var d = {}
 | 
| 160 |     ysh$ setvar d.eggex = /dot+/  # Eggex object
 | 
| 161 | 
 | 
| 162 |     ysh$ = d
 | 
| 163 |     (Dict 0x7feb87cb4050)   {"eggex":<Eggex 0x7feb87dbfd00>}
 | 
| 164 | 
 | 
| 165 |     ysh$ setvar d.cycle = d
 | 
| 166 | 
 | 
| 167 |     ysh$ = d
 | 
| 168 |     (Dict 0x7feb87cb4050)   {"eggex":<Eggex 0x7feb87dbfd00>,"cycle":{ --> 0x7feb87cb4050 }}
 | 
| 169 | 
 | 
| 170 | We should **replace** this with a new pretty printer that wraps lines, and has
 | 
| 171 | <span style="color: darkcyan; font-weight: bold">COLOR</span>.
 | 
| 172 | 
 | 
| 173 | ### Replace our ad hoc Zephyr ASDL pretty printer
 | 
| 174 | 
 | 
| 175 | **What is it?**  [Zephyr ASDL][ASDL] is the statically typed schema language we
 | 
| 176 | use to implement Oils.  It's "one level down" from the shell.
 | 
| 177 | 
 | 
| 178 | We used it to define the syntax of shell with **algebraic data types**.  We
 | 
| 179 | create a [lossless syntax tree]($xref:LST), which is also an **IR** for shell.
 | 
| 180 | 
 | 
| 181 | **Motivation**: We already wrote an ad hoc pretty printer, and it should be
 | 
| 182 | replaced!  It tries to fit records on a single line, and if that fails, it uses
 | 
| 183 | multiple lines.  I think it's slow.
 | 
| 184 | 
 | 
| 185 | If we already have a YSH data printer, this printer should "obviously" be
 | 
| 186 | unified with it.  We should have a nice separation of policy and mechanism.
 | 
| 187 | 
 | 
| 188 | ---
 | 
| 189 | 
 | 
| 190 | Use `osh -n myscript.sh` to see what it does:
 | 
| 191 | 
 | 
| 192 | 
 | 
| 193 | 
 | 
| 194 | Notes:
 | 
| 195 | 
 | 
| 196 | - The algorithm is in [asdl/format.py]($oils-src), and the "homogeneous IR" is
 | 
| 197 |   in [asdl/hnode.asdl]($oils-src).
 | 
| 198 |   - Each generated class definition has a `PrettyTree()` method, which converts
 | 
| 199 |     the **typed** `self` or `this` to the homogeneous `hnode` tree.
 | 
| 200 |   - `AbbreviatedTree()` is a bit like the **modular** printers discussed in the
 | 
| 201 |     `lobste.rs` thread.  It makes certain common data structures more
 | 
| 202 |     readable, with type-specific logic.  It's in Python only — can that
 | 
| 203 |     logic also be in C++?
 | 
| 204 |   - The syntax tree is actually a **graph**, and I recently added logic to
 | 
| 205 |     **omit duplicate** nodes.  This is unlike the JSON printer, which prints
 | 
| 206 |     duplicate nodes, as Python and node.js do.
 | 
| 207 | - The slowness hasn't really mattered, because this format isn't exposed to
 | 
| 208 |   users.  It's only for **debugging** Oils itself.  But it's useful, and we may
 | 
| 209 |   want to expose it.
 | 
| 210 | - Also used in `pp asdl (obj)`, another debugging feature.
 | 
| 211 | - TODO: Add a simple benchmark?  The new printer will probably be faster than
 | 
| 212 |   the old one.
 | 
| 213 |   - `osh -n benchmarks/testdata/configure-coreutils` tests a huge shell file
 | 
| 214 | 
 | 
| 215 | <!--
 | 
| 216 | - current state of `osh -n`
 | 
| 217 |   - timing of it -- I think this may take awhile.  It was never optimized.  It
 | 
| 218 |     produces MANY allocations.
 | 
| 219 |   - to be honest -- allocations and GC will probably **dominate** in Oils.
 | 
| 220 | -->
 | 
| 221 | 
 | 
| 222 | ### OSH-YSH Code Formatter
 | 
| 223 | 
 | 
| 224 | **What is it?** A formatter for shell code.  I think it's more essential to
 | 
| 225 | have a [YSH]($xref) formatter, but an [OSH]($xref) formatter is also possible.
 | 
| 226 | They both use the same [lossless syntax tree]($xref:LST).
 | 
| 227 | 
 | 
| 228 | **Motivation** - Formatters make a new language easier to use, and there are
 | 
| 229 | many users who don't know shell well. 
 | 
| 230 | 
 | 
| 231 | For example, I don't know TypeScript well, and I had a good experience with
 | 
| 232 | `deno fmt`.  It reduced the **mental load** of adopting a new tool.
 | 
| 233 | 
 | 
| 234 | ---
 | 
| 235 | 
 | 
| 236 | Justin had a nice idea on on `lobste.rs` - we should create **coarse tree**
 | 
| 237 | with these elements:
 | 
| 238 | 
 | 
| 239 | - `{ }` affect indentation in [YSH]($xref)
 | 
| 240 |   - In [OSH]($xref), we should also understand `then elif else fi`, `do done`,
 | 
| 241 |     etc.
 | 
| 242 | - `( )` in [YSH]($xref) changes the lexer from [command mode to expression
 | 
| 243 |   mode](command-vs-expression-mode.html)
 | 
| 244 | - **Newlines** can't appear in atomic `text()` / chunks
 | 
| 245 | - Comments need to be preserved at the end of lines
 | 
| 246 |   - They may also be aligned on consecutive lines (with heuristics)
 | 
| 247 | - Keywords like `while for if` begin blocks of code
 | 
| 248 | 
 | 
| 249 | Why?  We don't don't want to take responsibility for every formatting decision!
 | 
| 250 | 
 | 
| 251 | I actually think the command mode / statement formatter should be
 | 
| 252 | **non-wrapping**, while expressions can wrap.  Commands will likely be more
 | 
| 253 | common than expressions in most YSH code.
 | 
| 254 | 
 | 
| 255 | ---
 | 
| 256 | 
 | 
| 257 | The formatter will be invoked with `ysh --tool format myfile.ysh`.
 | 
| 258 | 
 | 
| 259 | It can also be used with the output of `osh --tool ysh-ify`, which roughly
 | 
| 260 | translates [OSH]($xref) to [YSH]($xref) (doesn't preserve semantics).  This may
 | 
| 261 | help generate **test data**, since there's plenty of shell code in the wild,
 | 
| 262 | but not much [YSH][] code.
 | 
| 263 | 
 | 
| 264 | ### Experimental: Export the Oils "Syntax Graph" to Users with "NIL8"
 | 
| 265 | 
 | 
| 266 | **What is it?** A more human AND machine-readable format for the syntax tree,
 | 
| 267 | which is actually a **graph**.
 | 
| 268 | 
 | 
| 269 | **Motivation**: The pretty-printed AST could be a **parseable** format.  Allow
 | 
| 270 | users to reuse all the hard work we did [parsing
 | 
| 271 | shell]($blog-tag:parsing-shell)!
 | 
| 272 | 
 | 
| 273 | ---
 | 
| 274 | 
 | 
| 275 | Printing raw [ASDL][] data is useful, but it could be more readable with custom
 | 
| 276 | logic to print the natural **layers** of the graph.  There are 4 logical layers
 | 
| 277 | in [frontend/syntax.asdl]($oils-src):
 | 
| 278 | 
 | 
| 279 | 1. `source_t` describes whether shell code comes from `foo.sh` or `ysh -c 'echo
 | 
| 280 |    mycode'`
 | 
| 281 | 2. `SourceLine` represents physical lines of code
 | 
| 282 | 3. `Token` refers to portions of lines
 | 
| 283 | 4. The syntax tree of `command_t word_t word_part_t expr_t`.  The leaves are
 | 
| 284 |    tokens.
 | 
| 285 | 
 | 
| 286 | And instead of a pretty format meant for humans, we may want to print an
 | 
| 287 | s-expression-like format I'm calling **"NIL8"**.
 | 
| 288 | 
 | 
| 289 | NIL8 can be parsed.  You may want to write tree-shaking code to deploy
 | 
| 290 | [YSH][] into containers.
 | 
| 291 | 
 | 
| 292 | More on NIL8 below.  Again, it's experimental.
 | 
| 293 | 
 | 
| 294 | ## Implementation Notes
 | 
| 295 | 
 | 
| 296 | ### Do the printers depend on each other?
 | 
| 297 | 
 | 
| 298 | - The [YSH][] printer (1) naturally comes before the [ASDL][] printer (2).
 | 
| 299 | - The code formatter (3) is concrete and useful to end users.
 | 
| 300 | - The NIL8 printer (4) comes after the [ASDL][] printer (2), but it's experimental.
 | 
| 301 |   - It depends on a bunch of other work in Oils/YSH.
 | 
| 302 | 
 | 
| 303 | Risks:
 | 
| 304 | 
 | 
| 305 | - No real risks for (1) and (2)
 | 
| 306 |   - They're "engineering" — Justin's blog post is very close!  It could
 | 
| 307 |     be ported almost literally to typed Python.  It will translate
 | 
| 308 |     automatically to C++.  (And it would be interesting to compare our
 | 
| 309 |     garbage-collected C++ with Rust's `Rc<T>`)
 | 
| 310 |   - [ASDL][] involves code generation in both Python and C++.  We have a custom
 | 
| 311 |     build system for this (using Ninja for C++).
 | 
| 312 | - The OSH/YSH formatter has non-trivial decisions
 | 
| 313 |   - End-of-line comments.  (Shell doesn't have block comments, which simplifies
 | 
| 314 |     things.)
 | 
| 315 |   - Multi-line strings in YSH have a special rule -- the indentation of the
 | 
| 316 |     closing `'''` is significant
 | 
| 317 |   - Shell here docs may be tricky
 | 
| 318 |   - This formatter probably requires the most "elbow grease".  This is why I
 | 
| 319 |     said that the statement formatter should initially be **non-wrapping** —
 | 
| 320 |     it reduces the scope of the problem.
 | 
| 321 | 
 | 
| 322 | ### Code Skeleton
 | 
| 323 | 
 | 
| 324 | I added some stubs in the code:
 | 
| 325 | 
 | 
| 326 | - [data_lang/pretty.asdl]($oils-src) - How we would express the IR
 | 
| 327 | - [data_lang/pretty.py]($oils-src) - YSH conversion.
 | 
| 328 | - [data_lang/pretty-benchmark.sh]($oils-src) - Our naive ASDL pretty printer is
 | 
| 329 |   slow.  It can take more than 3 seconds on a big file, vs. ~100ms to parse it.
 | 
| 330 |   (It does print over 100 MB of text though.)
 | 
| 331 | 
 | 
| 332 | To generate Python code from the ASDL schema, run `build/py.sh all`.
 | 
| 333 | Otherwise, Oils is a plain Python 2 program, with a few C extensions.
 | 
| 334 | 
 | 
| 335 | C++ translation is a separate step, and it's now pretty polished.
 | 
| 336 | 
 | 
| 337 | For new contributors:
 | 
| 338 | 
 | 
| 339 | - [Contributing]($wiki) on the wiki
 | 
| 340 | - [Where Contributors Have Problems]($wiki)
 | 
| 341 | 
 | 
| 342 | There is also a stub for the formatter:
 | 
| 343 | 
 | 
| 344 | - [tools/fmt.py]($oils-src) - Stub file for the formatter.
 | 
| 345 |   - Code copied from [tools/ysh_ify.py]($oils-src).
 | 
| 346 | 
 | 
| 347 | ## Design Questions
 | 
| 348 | 
 | 
| 349 | This section has some less concrete thoughts.
 | 
| 350 | 
 | 
| 351 | ### Non-Wrapping Printers aka "Indenters" - same PPL IR?
 | 
| 352 | 
 | 
| 353 | I think the PPL IRs are also useful if you're not line wrapping?  You can just
 | 
| 354 | fix indentation.
 | 
| 355 | 
 | 
| 356 | ### Columnar Layouts (spending more time)
 | 
| 357 | 
 | 
| 358 | `nodejs` does this in its `console.log()`.
 | 
| 359 | 
 | 
| 360 | 
 | 
| 361 | 
 | 
| 362 | Future work?  We should get the basic pretty printer working first.
 | 
| 363 | 
 | 
| 364 | ### Two Levels of Coarse Parsing for YSH?  (not at first)
 | 
| 365 | 
 | 
| 366 | Making the coarse tree has some similarity to syntax highlighting.  I wrote a 
 | 
| 367 | simple syntax highlighter for 5+ languages called `micro-syntax`, and it should
 | 
| 368 | support [YSH]($xref) too.
 | 
| 369 | 
 | 
| 370 | Sketch:
 | 
| 371 | 
 | 
| 372 | 1. First make the **really coarse tree**, something like: `Comment | Code |
 | 
| 373 |    StringLiteral`
 | 
| 374 | 2. Then make a **less coarse** tree for pretty printing:
 | 
| 375 |    - Lex code into `{} ()`
 | 
| 376 |    - Categorize comments into `EndLineComment` | `BeginLineComment`
 | 
| 377 | 
 | 
| 378 | Then do a trivial linear pass to fix up indentation.  The `{ }` or `do done`
 | 
| 379 | tokens determine indentation.
 | 
| 380 | 
 | 
| 381 | ---
 | 
| 382 | 
 | 
| 383 | Though honestly it's probably better to just **reuse** our elaborate parser at
 | 
| 384 | first.  I like to "compress" different algorithms together, but it may not be
 | 
| 385 | worth it here.
 | 
| 386 | 
 | 
| 387 | ### NIL8 - Uses cases for both Code and Data
 | 
| 388 | 
 | 
| 389 | What is "NIL8"?  We don't know if it's a good idea yet, but it may be part of
 | 
| 390 | [J8 Notation](j8-notation.html).
 | 
| 391 | 
 | 
| 392 | Think:
 | 
| 393 | 
 | 
| 394 | - A mash-up of [JSON][] and S-expressions
 | 
| 395 |   - *NIL8 Isn't Lisp*
 | 
| 396 |   - *Narrow Intermediate Language*
 | 
| 397 | - WebAssembly text format
 | 
| 398 |   - An IR for an **imperative** language, with a Lisp-y syntax.
 | 
| 399 | - An **exterior** S-expression format
 | 
| 400 |   - Blog: [Oils is Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html)
 | 
| 401 |   - I posted POSE (portable s-expressions) on lobste.rs for this reason:
 | 
| 402 |     <https://lobste.rs/s/lwf4jv/pose_portable_s_expressions_pose_spec> (no
 | 
| 403 |     comments)
 | 
| 404 | 
 | 
| 405 | [JSON]: $xref
 | 
| 406 | 
 | 
| 407 | If NIL8 can represent both the [lossless syntax tree]($xref:LST) and a new IR
 | 
| 408 | for a [mycpp]($xref) rewrite ("yaks"), that's a good test of the design.
 | 
| 409 | 
 | 
| 410 | Note that the AST is a **statically typed** data structure, which means we may
 | 
| 411 | also want to export the [ASDL][] **schema** as NIL8!
 | 
| 412 | 
 | 
| 413 | Links:
 | 
| 414 | 
 | 
| 415 | - [#data-languages > NIL8 Infix Rule](https://oilshell.zulipchat.com/#narrow/stream/403584-data-languages/topic/NIL8.20Infix.20Rule)
 | 
| 416 | - [Commit describing infix rule](https://github.com/oilshell/oil/commit/b9cecf88838d4c89ce1dbd8f4bcdd8e92e10d442)
 | 
| 417 | 
 | 
| 418 | At a high level, we're trying to nudge users toward a **small** set of syntaxes
 | 
| 419 | for shell-like programming, rather than inventing ad hoc syntax every time.
 | 
| 420 | String literals are a pain point: people often implement them badly, or not at
 | 
| 421 | all.
 | 
| 422 | 
 | 
| 423 | ## Conclusion 
 | 
| 424 | 
 | 
| 425 | I think we should have shared infrastructure for 3 printers:
 | 
| 426 | 
 | 
| 427 | 1. [YSH]($xref) data structures
 | 
| 428 | 1. [ASDL][] data structures
 | 
| 429 | 1. [OSH]($xref) and [YSH]($xref) code
 | 
| 430 | 
 | 
| 431 | And then there's this idea of "replacing" or rationalizing the [ASDL][] syntax
 | 
| 432 | tree with "NIL8":
 | 
| 433 | 
 | 
| 434 | - It can be parsed, not just printed.
 | 
| 435 |   - To parse, you can reuse an existing [JSON]($xref) string lexer.  IMO, this
 | 
| 436 |     is the fiddliest part of parsing.
 | 
| 437 | - It can export a graph shape, in natural "layers".
 | 
| 438 | 
 | 
| 439 | ## Related 
 | 
| 440 | 
 | 
| 441 | ### Docs
 | 
| 442 | 
 | 
| 443 | - [Parser Architecture](parser-architecture.html) - describes issues like the
 | 
| 444 |   **"lossless invariant"**, which is affected by *re-parsing*.
 | 
| 445 |   - I recently updated it, and tested the invariant with
 | 
| 446 |     [test/lossless.sh]($oils-src).
 | 
| 447 | - The repo [README]($oils-doc:oils-repo/README.html) has an overview of the
 | 
| 448 |   code.
 | 
| 449 | 
 | 
| 450 | ### Fun Computer Science Problems in Oils
 | 
| 451 | 
 | 
| 452 | Pretty printing is adjacent to other fun problems in Oils, like GC performance,
 | 
| 453 | "boxless" optimization, etc.
 | 
| 454 | 
 | 
| 455 | - [#help-wanted > Fun Computer Science Problems](https://oilshell.zulipchat.com/#narrow/stream/417617-help-wanted/topic/Fun.20Computer.20Science.20Problems)
 | 
| 456 | 
 | 
| 457 | Things to think about:
 | 
| 458 | 
 | 
| 459 | (1) Unified Code Representation for Oils
 | 
| 460 | 
 | 
| 461 | We want to pack all these tools into Oils:
 | 
| 462 | 
 | 
| 463 | - The interpreter, for **execution**
 | 
| 464 |   - prints precise errors, ignores comment tokens
 | 
| 465 | - `ysh-ify` - a VERY rough **translation** of [OSH]($xref) to [YSH]($xref)
 | 
| 466 |   - doesn't respect semantics, because you really need static types for that
 | 
| 467 |   - uses "span ID"
 | 
| 468 | - **Pretty Printing** (this doc)
 | 
| 469 |   - will also use "span ID" to detect comment positions, I think
 | 
| 470 | - Syntax **highlighting**
 | 
| 471 |   - Interactive highlighting will help users learn the language
 | 
| 472 |   - It's **recursive**, because sublanguages are mutually recursive: string
 | 
| 473 |     literals, commands, expressions
 | 
| 474 | 
 | 
| 475 | Related: Retrospective on the Go `ast` package.
 | 
| 476 | 
 | 
| 477 | (2) Yaks - *mycpp from the bottom up*
 | 
| 478 | 
 | 
| 479 | NIL8 leads into **"Yaks"**, which is an IR for garbage collected C++.  
 | 
| 480 | 
 | 
| 481 | - Yaks is of course a big yak shave, but it's **concrete** because we recently
 | 
| 482 |   **completed** the translation with [mycpp]($xref).  (mycpp is a crappy
 | 
| 483 |   program which produces good results!)
 | 
| 484 | 
 | 
| 485 | (3) Pretty printing will cause many small **allocations**.
 | 
| 486 | 
 | 
| 487 | I think that naive implementations should be fast enough.  If not, any slowness
 | 
| 488 | may be due to allocation, not necessarily the pretty printing algorithm itself.
 | 
| 489 | 
 | 
| 490 | Some solutions:
 | 
| 491 | 
 | 
| 492 | - We want to move towards a **tagged pointer** runtime, an ambitious
 | 
| 493 |   transformation of the entire interpreter.
 | 
| 494 |   - But we'll do it in steps.  First step: Small String Optimization.
 | 
| 495 |   - Yaks goes along with a tagged pointer runtime.  Yaks also has a principled
 | 
| 496 |     IR, which may be represented with many small objects.
 | 
| 497 | - GC rooting optimization should give a speed boost
 | 
| 498 | 
 | 
| 499 | ## Appendix: Older Notes
 | 
| 500 | 
 | 
| 501 | This is about ref counting for printing **graph-shaped** data.
 | 
| 502 | 
 | 
| 503 | I no longer think this is as important.  I think we should **manually**
 | 
| 504 | construct 4 layers of the graph, as described in the section on formatter (4).
 | 
| 505 | 
 | 
| 506 | ### Dynamically Typed YSH Data (`value_t`)
 | 
| 507 | 
 | 
| 508 | Similar to JSON / JSON8 printing, except we 
 | 
| 509 | 
 | 
| 510 | 1. count references, and then print `...` instead of repeating
 | 
| 511 | 1. line wrap
 | 
| 512 | 1. assign colors
 | 
| 513 |    - for atoms, and possibly for balanced parens, to make it more readable
 | 
| 514 | 
 | 
| 515 | #### Step 1: Count References
 | 
| 516 | 
 | 
| 517 | This is a global pass that computes a Dict[int, int]
 | 
| 518 | 
 | 
| 519 |     object ID -> number of times referenced in the graph
 | 
| 520 | 
 | 
| 521 | The graph is specified by single root node, e.g. the argument to
 | 
| 522 | 
 | 
| 523 |     pp line (obj)
 | 
| 524 | 
 | 
| 525 | Pass this dict into the second step.
 | 
| 526 | 
 | 
| 527 | #### Step 2: Convert To Homogeneous Representation
 | 
| 528 | 
 | 
| 529 |     value.List   -> hnode.Compound with []
 | 
| 530 |     value.Dict   -> hnode.Compound with {}
 | 
| 531 | 
 | 
| 532 |     null, true, false -> Atom
 | 
| 533 |     Cycle detected -> Atom, with { --> 4beef2 }
 | 
| 534 |                                  [ --> 4beef2 ]
 | 
| 535 | 
 | 
| 536 | Repetition:
 | 
| 537 | 
 | 
| 538 |     { key: { ... 4beef2 }, key2: { ... 4beef2 }
 | 
| 539 | 
 | 
| 540 | Or maybe omit the type, since strings don't have that:
 | 
| 541 | 
 | 
| 542 |     { key: ... 4beef2, key2: ... 4beef2 }
 | 
| 543 | 
 | 
| 544 | #### hnode Schema
 | 
| 545 | 
 | 
| 546 | The schema looks like this now?
 | 
| 547 | 
 | 
| 548 |     hnode = 
 | 
| 549 |       Atom(str s, color color) - # External objects can use this?
 | 
| 550 |     | Compound(hnode* items)
 | 
| 551 | 
 | 
| 552 | The length of 'str s' is the input to line wrapping.
 | 
| 553 | 
 | 
| 554 | #### Step 3: Figure out what's on each line
 | 
| 555 | 
 | 
| 556 | TODO: what's the heuristic here?  Is it global?
 | 
| 557 | 
 | 
| 558 | Dynamic programming?
 | 
| 559 | 
 | 
| 560 | do we insert hnode.Newline() or something?
 | 
| 561 | 
 | 
| 562 | ### Statically Typed ASDL Data
 | 
| 563 | 
 | 
| 564 | Reduce it to the case above.
 | 
| 565 | 
 | 
| 566 | #### Step 1 - Ref Counting / Cycle Detection?
 | 
| 567 | 
 | 
| 568 | We do this all at once?
 | 
| 569 | 
 | 
| 570 | Because to convert to value.Record, you have to do cycle detection anyway.
 | 
| 571 | 
 | 
| 572 | And that's similar to ref counting.
 | 
| 573 | 
 | 
| 574 | #### Step 2 - ASDL records -> value.Record
 | 
| 575 | 
 | 
| 576 |     value = 
 | 
| 577 |         ...
 | 
| 578 |       | Record(str type_name, Dict[str, value_t] fields)
 | 
| 579 | 
 | 
| 580 | The special "-" key can be used for JSON:
 | 
| 581 | 
 | 
| 582 |     {"-": "command.Simple, "name": "hi"}
 | 
| 583 | 
 | 
| 584 | Though this loses some information, and it doesn't solve the problem with
 | 
| 585 | shared references.  We would need Packle for that.
 | 
| 586 | 
 | 
| 587 | #### Step 2a: Optional Abbreviation?
 | 
| 588 | 
 | 
| 589 | Is this separate?  Or part of step 2.
 | 
| 590 | 
 | 
| 591 | We need something between value.Record and hnode.Compound
 | 
| 592 | to do ABBREVIATION:
 | 
| 593 | 
 | 
| 594 | - Abbreviate type name, or omit it
 | 
| 595 | - Omit some field names (requires schema to record it)
 | 
| 596 | - Change () to <>
 | 
| 597 | 
 | 
| 598 | Also need nodes for
 | 
| 599 | 
 | 
| 600 | - `...` means already printed
 | 
| 601 | - `---` means CANNOT print, because it's a cycle
 | 
| 602 | - `@1f23` - ID if already printed, or in a cycle
 | 
| 603 | 
 | 
| 604 | #### Steps 3 and 4 - Homogeneous Representation, Line Wrapping
 | 
| 605 | 
 | 
| 606 | Identical to the dynamically typed case above.
 | 
| 607 | 
 |