1 | ---
|
2 | default_highlighter: oils-sh
|
3 | ---
|
4 |
|
5 | Up to 4 Pretty Printers?
|
6 | ========================
|
7 |
|
8 | *(March 2024)*
|
9 |
|
10 | Oils **parses** a lot of text, and it's becoming apparent than we need a
|
11 | **print** a lot too! The text should be nicely formatted because a shell is a
|
12 | user interface.
|
13 |
|
14 | This doc describes 4 possible pretty printers in Oils. Traditional shells
|
15 | don't have appear to have any pretty printing.
|
16 |
|
17 | [OSH]: $xref
|
18 | [YSH]: $xref
|
19 |
|
20 | <div id="toc">
|
21 | </div>
|
22 |
|
23 | ## Screenshots
|
24 |
|
25 | Let's be concrete first, because there's a lot of brainstorming below.
|
26 |
|
27 | [YSH]($xref) prints its JSON-like data structures with the `=` keyword, which
|
28 | takes a Python-like expression on the right:
|
29 |
|
30 | ![ysh issues.json](https://app.oilshell.org/picdir/resize?name=14qke97__ysh-issues.png&max-width=600)
|
31 |
|
32 | Right now, it looks bad on big data structures. It should look something like
|
33 | `nodejs` or `jq`:
|
34 |
|
35 | ![node.js issues.json](https://app.oilshell.org/picdir/resize?name=13b35jj__nodejs-issues.png&max-width=600)
|
36 |
|
37 | ![jq issues.json](https://app.oilshell.org/picdir/resize?name=11wsgpm__jq-issues.png&max-width=600)
|
38 |
|
39 | We may want to omit the quotes, like `nodejs`. (This syntax isn't meant to be
|
40 | parsed. [JSON8]($xref) may have unquoted dict keys, although it's not
|
41 | essential).
|
42 |
|
43 | ## Background - `go fmt` style versus PPL style
|
44 |
|
45 | To back up a bit, I'm writing this doc organize my thoughts, and to explain
|
46 | problems and requirements to contributors.
|
47 |
|
48 | There are at least two schools of thought on pretty printers, which
|
49 | this lobste.rs thread has a good discussion of:
|
50 |
|
51 | - *Why is Prettier Rock Solid?* <https://lobste.rs/s/aevptj/why_is_prettier_rock_solid>
|
52 | - HN comments: <https://news.ycombinator.com/item?id=39437424>. Aside: a top
|
53 | comment shows why we don't want to take responsibility for *all* formatting
|
54 | decisions in our OSH-YSH printer! Users are opinionated. The JavaScript
|
55 | formatter Prettier is criticized for a bug, and for being **slow**.
|
56 |
|
57 | More comments on a blog post by Justin Pombrio (which I circulated):
|
58 |
|
59 | - *A Twist on Wadler's Printer*
|
60 | <https://lobste.rs/s/1r0aak/twist_on_wadler_s_printer>
|
61 |
|
62 | Let's call the two styles the "`go fmt` style" and the "PPL style" (functional
|
63 | pretty printing language).
|
64 |
|
65 | I was probably "biased" toward `go fmt`, because the two formatters we actually
|
66 | **use** in Oils are influenced by it:
|
67 |
|
68 | 1. `clang-format` for our C++ code.
|
69 | - This is the best formatter I've used. It's fast, and e.g. does a good job
|
70 | on C macros.
|
71 | 1. `yapf` for our Python code.
|
72 | - It's intentionally a "clone" for `clang-format`. (It's slow, mostly due
|
73 | to being written in Python, and creating lots of tiny objects!)
|
74 |
|
75 | (Details: they have line wrapping algorithms, while `go fmt` doesn't. I'm not
|
76 | calling them "graph search" printers, because I think of line wrapping as a
|
77 | **subproblem** of pretty printing.)
|
78 |
|
79 | <!--
|
80 |
|
81 | Related: a bunch of Zulip threads where I was learning about pretty printing:
|
82 |
|
83 | - TODO: Link to threads
|
84 |
|
85 | (I've written many parsers before, but only one ad hoc pretty printer, which I
|
86 | want to get rid of. Discussed below.)
|
87 |
|
88 | -->
|
89 |
|
90 | ### Why PPL style?
|
91 |
|
92 | However, Justin's post helped me understand Wadler's printer, a popular example
|
93 | of the PPL style. This style might have some advantages for Oils:
|
94 |
|
95 | - There's no user-provided layout for data structures — either
|
96 | [YSH]($xref) data or [Zephyr ASDL][ASDL] data. So we need to synthesize a
|
97 | layout from scratch.
|
98 | - We have multiple languages to format, and the PPL style separates **policy**
|
99 | (language rules) and **mechanism** (line wrapping). So we should try a more
|
100 | principled architecture, hopefully without sacrificing quality.
|
101 | - The two styles may not be as distinct as they seem at first. They may be
|
102 | complementary.
|
103 | - We can probably use a PPL for the **expression subproblem** of the OSH-YSH
|
104 | shell formatter. The rest of the formatter will have rules that **don't**
|
105 | have to do with line wrapping (aligning EOL comments like `go fmt` does,
|
106 | etc.)
|
107 | - I think a **non-wrapping** pretty printer — an "indenter" — can
|
108 | use something similar to the PPL IRs. Notes below.
|
109 | - PPLs *could be* slower (asymptotically and in practice) than the custom
|
110 | algorithms, but I think that can be solved with a simple rule in practice:
|
111 | 1. Compute the total size of the data structure/doc up front.
|
112 | 1. If it's small, spend extra time to make it look pretty, by using an
|
113 | **expressive and slow** PPL. We can be quadratic or worse, perhaps. We
|
114 | might want `node.js`-like columnar layout.
|
115 | 1. If it's big, use a **simple and fast** PPL subset.
|
116 |
|
117 | (Does that last idea work?)
|
118 |
|
119 | [ASDL]: $xref:zephyr-asdl
|
120 |
|
121 |
|
122 | ### Warning
|
123 |
|
124 | Sometimes I pile on too many requirements, which I mentioned in the latest
|
125 | release announcement:
|
126 |
|
127 | - [Oils 0.20.0 > Why I'm Working on JSON](https://www.oilshell.org/blog/2024/02/release-0.20.0.html#zulip-why-am-i-working-on-json)
|
128 |
|
129 | We should do the simplest things first, and I think the PPL approach will allow
|
130 | that.
|
131 |
|
132 | BTW there are many threads on `#data-languages` on [Zulip]($xref) where I'm
|
133 | brainstorming and learning about pretty printing.
|
134 |
|
135 | ## Four Printers - What and Why?
|
136 |
|
137 | Here's a sketch of what I think we need. It goes from **concrete** →
|
138 | **experimental** and research-y.
|
139 |
|
140 | ### Print YSH data types in a JSON-like format
|
141 |
|
142 | **What is it?** This is the `=` keyword shown in the screenshots above. (BTW,
|
143 | Lua uses the same syntax `=` to evaluate expressions.)
|
144 |
|
145 | **Motivation**: We should look as good as `node.js` or `jq`.
|
146 |
|
147 | ---
|
148 |
|
149 | Currently we use our JSON printer with the options `SHOW_NON_DATA |
|
150 | SHOW_CYCLES`.
|
151 |
|
152 | - `SHOW_NOW_DATA` prints non-data objects, like `<Func 0x123>`. This syntax
|
153 | can't be parsed.
|
154 | - `SHOW_CYCLES` prints cycles with `-->`, instead of raising an error, like
|
155 | JSON does.
|
156 |
|
157 | Example:
|
158 |
|
159 | ysh$ var d = {}
|
160 | ysh$ setvar d.eggex = /dot+/ # Eggex object
|
161 |
|
162 | ysh$ = d
|
163 | (Dict 0x7feb87cb4050) {"eggex":<Eggex 0x7feb87dbfd00>}
|
164 |
|
165 | ysh$ setvar d.cycle = d
|
166 |
|
167 | ysh$ = d
|
168 | (Dict 0x7feb87cb4050) {"eggex":<Eggex 0x7feb87dbfd00>,"cycle":{ --> 0x7feb87cb4050 }}
|
169 |
|
170 | We should **replace** this with a new pretty printer that wraps lines, and has
|
171 | <span style="color: darkcyan; font-weight: bold">COLOR</span>.
|
172 |
|
173 | ### Replace our ad hoc Zephyr ASDL pretty printer
|
174 |
|
175 | **What is it?** [Zephyr ASDL][ASDL] is the statically typed schema language we
|
176 | use to implement Oils. It's "one level down" from the shell.
|
177 |
|
178 | We used it to define the syntax of shell with **algebraic data types**. We
|
179 | create a [lossless syntax tree]($xref:LST), which is also an **IR** for shell.
|
180 |
|
181 | **Motivation**: We already wrote an ad hoc pretty printer, and it should be
|
182 | replaced! It tries to fit records on a single line, and if that fails, it uses
|
183 | multiple lines. I think it's slow.
|
184 |
|
185 | If we already have a YSH data printer, this printer should "obviously" be
|
186 | unified with it. We should have a nice separation of policy and mechanism.
|
187 |
|
188 | ---
|
189 |
|
190 | Use `osh -n myscript.sh` to see what it does:
|
191 |
|
192 | ![osh -n](https://app.oilshell.org/picdir/resize?name=1lwb0bf__osh-n.png&max-width=600)
|
193 |
|
194 | Notes:
|
195 |
|
196 | - The algorithm is in [asdl/format.py]($oils-src), and the "homogeneous IR" is
|
197 | in [asdl/hnode.asdl]($oils-src).
|
198 | - Each generated class definition has a `PrettyTree()` method, which converts
|
199 | the **typed** `self` or `this` to the homogeneous `hnode` tree.
|
200 | - `AbbreviatedTree()` is a bit like the **modular** printers discussed in the
|
201 | `lobste.rs` thread. It makes certain common data structures more
|
202 | readable, with type-specific logic. It's in Python only — can that
|
203 | logic also be in C++?
|
204 | - The syntax tree is actually a **graph**, and I recently added logic to
|
205 | **omit duplicate** nodes. This is unlike the JSON printer, which prints
|
206 | duplicate nodes, as Python and node.js do.
|
207 | - The slowness hasn't really mattered, because this format isn't exposed to
|
208 | users. It's only for **debugging** Oils itself. But it's useful, and we may
|
209 | want to expose it.
|
210 | - Also used in `pp asdl (obj)`, another debugging feature.
|
211 | - TODO: Add a simple benchmark? The new printer will probably be faster than
|
212 | the old one.
|
213 | - `osh -n benchmarks/testdata/configure-coreutils` tests a huge shell file
|
214 |
|
215 | <!--
|
216 | - current state of `osh -n`
|
217 | - timing of it -- I think this may take awhile. It was never optimized. It
|
218 | produces MANY allocations.
|
219 | - to be honest -- allocations and GC will probably **dominate** in Oils.
|
220 | -->
|
221 |
|
222 | ### OSH-YSH Code Formatter
|
223 |
|
224 | **What is it?** A formatter for shell code. I think it's more essential to
|
225 | have a [YSH]($xref) formatter, but an [OSH]($xref) formatter is also possible.
|
226 | They both use the same [lossless syntax tree]($xref:LST).
|
227 |
|
228 | **Motivation** - Formatters make a new language easier to use, and there are
|
229 | many users who don't know shell well.
|
230 |
|
231 | For example, I don't know TypeScript well, and I had a good experience with
|
232 | `deno fmt`. It reduced the **mental load** of adopting a new tool.
|
233 |
|
234 | ---
|
235 |
|
236 | Justin had a nice idea on on `lobste.rs` - we should create **coarse tree**
|
237 | with these elements:
|
238 |
|
239 | - `{ }` affect indentation in [YSH]($xref)
|
240 | - In [OSH]($xref), we should also understand `then elif else fi`, `do done`,
|
241 | etc.
|
242 | - `( )` in [YSH]($xref) changes the lexer from [command mode to expression
|
243 | mode](command-vs-expression-mode.html)
|
244 | - **Newlines** can't appear in atomic `text()` / chunks
|
245 | - Comments need to be preserved at the end of lines
|
246 | - They may also be aligned on consecutive lines (with heuristics)
|
247 | - Keywords like `while for if` begin blocks of code
|
248 |
|
249 | Why? We don't don't want to take responsibility for every formatting decision!
|
250 |
|
251 | I actually think the command mode / statement formatter should be
|
252 | **non-wrapping**, while expressions can wrap. Commands will likely be more
|
253 | common than expressions in most YSH code.
|
254 |
|
255 | ---
|
256 |
|
257 | The formatter will be invoked with `ysh --tool format myfile.ysh`.
|
258 |
|
259 | It can also be used with the output of `osh --tool ysh-ify`, which roughly
|
260 | translates [OSH]($xref) to [YSH]($xref) (doesn't preserve semantics). This may
|
261 | help generate **test data**, since there's plenty of shell code in the wild,
|
262 | but not much [YSH][] code.
|
263 |
|
264 | ### Experimental: Export the Oils "Syntax Graph" to Users with "NIL8"
|
265 |
|
266 | **What is it?** A more human AND machine-readable format for the syntax tree,
|
267 | which is actually a **graph**.
|
268 |
|
269 | **Motivation**: The pretty-printed AST could be a **parseable** format. Allow
|
270 | users to reuse all the hard work we did [parsing
|
271 | shell]($blog-tag:parsing-shell)!
|
272 |
|
273 | ---
|
274 |
|
275 | Printing raw [ASDL][] data is useful, but it could be more readable with custom
|
276 | logic to print the natural **layers** of the graph. There are 4 logical layers
|
277 | in [frontend/syntax.asdl]($oils-src):
|
278 |
|
279 | 1. `source_t` describes whether shell code comes from `foo.sh` or `ysh -c 'echo
|
280 | mycode'`
|
281 | 2. `SourceLine` represents physical lines of code
|
282 | 3. `Token` refers to portions of lines
|
283 | 4. The syntax tree of `command_t word_t word_part_t expr_t`. The leaves are
|
284 | tokens.
|
285 |
|
286 | And instead of a pretty format meant for humans, we may want to print an
|
287 | s-expression-like format I'm calling **"NIL8"**.
|
288 |
|
289 | NIL8 can be parsed. You may want to write tree-shaking code to deploy
|
290 | [YSH][] into containers.
|
291 |
|
292 | More on NIL8 below. Again, it's experimental.
|
293 |
|
294 | ## Implementation Notes
|
295 |
|
296 | ### Do the printers depend on each other?
|
297 |
|
298 | - The [YSH][] printer (1) naturally comes before the [ASDL][] printer (2).
|
299 | - The code formatter (3) is concrete and useful to end users.
|
300 | - The NIL8 printer (4) comes after the [ASDL][] printer (2), but it's experimental.
|
301 | - It depends on a bunch of other work in Oils/YSH.
|
302 |
|
303 | Risks:
|
304 |
|
305 | - No real risks for (1) and (2)
|
306 | - They're "engineering" — Justin's blog post is very close! It could
|
307 | be ported almost literally to typed Python. It will translate
|
308 | automatically to C++. (And it would be interesting to compare our
|
309 | garbage-collected C++ with Rust's `Rc<T>`)
|
310 | - [ASDL][] involves code generation in both Python and C++. We have a custom
|
311 | build system for this (using Ninja for C++).
|
312 | - The OSH/YSH formatter has non-trivial decisions
|
313 | - End-of-line comments. (Shell doesn't have block comments, which simplifies
|
314 | things.)
|
315 | - Multi-line strings in YSH have a special rule -- the indentation of the
|
316 | closing `'''` is significant
|
317 | - Shell here docs may be tricky
|
318 | - This formatter probably requires the most "elbow grease". This is why I
|
319 | said that the statement formatter should initially be **non-wrapping** —
|
320 | it reduces the scope of the problem.
|
321 |
|
322 | ### Code Skeleton
|
323 |
|
324 | I added some stubs in the code:
|
325 |
|
326 | - [data_lang/pretty.asdl]($oils-src) - How we would express the IR
|
327 | - [data_lang/pretty.py]($oils-src) - YSH conversion.
|
328 | - [data_lang/pretty-benchmark.sh]($oils-src) - Our naive ASDL pretty printer is
|
329 | slow. It can take more than 3 seconds on a big file, vs. ~100ms to parse it.
|
330 | (It does print over 100 MB of text though.)
|
331 |
|
332 | To generate Python code from the ASDL schema, run `build/py.sh all`.
|
333 | Otherwise, Oils is a plain Python 2 program, with a few C extensions.
|
334 |
|
335 | C++ translation is a separate step, and it's now pretty polished.
|
336 |
|
337 | For new contributors:
|
338 |
|
339 | - [Contributing]($wiki) on the wiki
|
340 | - [Where Contributors Have Problems]($wiki)
|
341 |
|
342 | There is also a stub for the formatter:
|
343 |
|
344 | - [tools/fmt.py]($oils-src) - Stub file for the formatter.
|
345 | - Code copied from [tools/ysh_ify.py]($oils-src).
|
346 |
|
347 | ## Design Questions
|
348 |
|
349 | This section has some less concrete thoughts.
|
350 |
|
351 | ### Non-Wrapping Printers aka "Indenters" - same PPL IR?
|
352 |
|
353 | I think the PPL IRs are also useful if you're not line wrapping? You can just
|
354 | fix indentation.
|
355 |
|
356 | ### Columnar Layouts (spending more time)
|
357 |
|
358 | `nodejs` does this in its `console.log()`.
|
359 |
|
360 | ![python node.js](https://app.oilshell.org/picdir/resize?name=uscdu6__python-nodejs.png&max-width=600)
|
361 |
|
362 | Future work? We should get the basic pretty printer working first.
|
363 |
|
364 | ### Two Levels of Coarse Parsing for YSH? (not at first)
|
365 |
|
366 | Making the coarse tree has some similarity to syntax highlighting. I wrote a
|
367 | simple syntax highlighter for 5+ languages called `micro-syntax`, and it should
|
368 | support [YSH]($xref) too.
|
369 |
|
370 | Sketch:
|
371 |
|
372 | 1. First make the **really coarse tree**, something like: `Comment | Code |
|
373 | StringLiteral`
|
374 | 2. Then make a **less coarse** tree for pretty printing:
|
375 | - Lex code into `{} ()`
|
376 | - Categorize comments into `EndLineComment` | `BeginLineComment`
|
377 |
|
378 | Then do a trivial linear pass to fix up indentation. The `{ }` or `do done`
|
379 | tokens determine indentation.
|
380 |
|
381 | ---
|
382 |
|
383 | Though honestly it's probably better to just **reuse** our elaborate parser at
|
384 | first. I like to "compress" different algorithms together, but it may not be
|
385 | worth it here.
|
386 |
|
387 | ### NIL8 - Uses cases for both Code and Data
|
388 |
|
389 | What is "NIL8"? We don't know if it's a good idea yet, but it may be part of
|
390 | [J8 Notation](j8-notation.html).
|
391 |
|
392 | Think:
|
393 |
|
394 | - A mash-up of [JSON][] and S-expressions
|
395 | - *NIL8 Isn't Lisp*
|
396 | - *Narrow Intermediate Language*
|
397 | - WebAssembly text format
|
398 | - An IR for an **imperative** language, with a Lisp-y syntax.
|
399 | - An **exterior** S-expression format
|
400 | - Blog: [Oils is Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html)
|
401 | - I posted POSE (portable s-expressions) on lobste.rs for this reason:
|
402 | <https://lobste.rs/s/lwf4jv/pose_portable_s_expressions_pose_spec> (no
|
403 | comments)
|
404 |
|
405 | [JSON]: $xref
|
406 |
|
407 | If NIL8 can represent both the [lossless syntax tree]($xref:LST) and a new IR
|
408 | for a [mycpp]($xref) rewrite ("yaks"), that's a good test of the design.
|
409 |
|
410 | Note that the AST is a **statically typed** data structure, which means we may
|
411 | also want to export the [ASDL][] **schema** as NIL8!
|
412 |
|
413 | Links:
|
414 |
|
415 | - [#data-languages > NIL8 Infix Rule](https://oilshell.zulipchat.com/#narrow/stream/403584-data-languages/topic/NIL8.20Infix.20Rule)
|
416 | - [Commit describing infix rule](https://github.com/oilshell/oil/commit/b9cecf88838d4c89ce1dbd8f4bcdd8e92e10d442)
|
417 |
|
418 | At a high level, we're trying to nudge users toward a **small** set of syntaxes
|
419 | for shell-like programming, rather than inventing ad hoc syntax every time.
|
420 | String literals are a pain point: people often implement them badly, or not at
|
421 | all.
|
422 |
|
423 | ## Conclusion
|
424 |
|
425 | I think we should have shared infrastructure for 3 printers:
|
426 |
|
427 | 1. [YSH]($xref) data structures
|
428 | 1. [ASDL][] data structures
|
429 | 1. [OSH]($xref) and [YSH]($xref) code
|
430 |
|
431 | And then there's this idea of "replacing" or rationalizing the [ASDL][] syntax
|
432 | tree with "NIL8":
|
433 |
|
434 | - It can be parsed, not just printed.
|
435 | - To parse, you can reuse an existing [JSON]($xref) string lexer. IMO, this
|
436 | is the fiddliest part of parsing.
|
437 | - It can export a graph shape, in natural "layers".
|
438 |
|
439 | ## Related
|
440 |
|
441 | ### Docs
|
442 |
|
443 | - [Parser Architecture](parser-architecture.html) - describes issues like the
|
444 | **"lossless invariant"**, which is affected by *re-parsing*.
|
445 | - I recently updated it, and tested the invariant with
|
446 | [test/lossless.sh]($oils-src).
|
447 | - The repo [README]($oils-doc:oils-repo/README.html) has an overview of the
|
448 | code.
|
449 |
|
450 | ### Fun Computer Science Problems in Oils
|
451 |
|
452 | Pretty printing is adjacent to other fun problems in Oils, like GC performance,
|
453 | "boxless" optimization, etc.
|
454 |
|
455 | - [#help-wanted > Fun Computer Science Problems](https://oilshell.zulipchat.com/#narrow/stream/417617-help-wanted/topic/Fun.20Computer.20Science.20Problems)
|
456 |
|
457 | Things to think about:
|
458 |
|
459 | (1) Unified Code Representation for Oils
|
460 |
|
461 | We want to pack all these tools into Oils:
|
462 |
|
463 | - The interpreter, for **execution**
|
464 | - prints precise errors, ignores comment tokens
|
465 | - `ysh-ify` - a VERY rough **translation** of [OSH]($xref) to [YSH]($xref)
|
466 | - doesn't respect semantics, because you really need static types for that
|
467 | - uses "span ID"
|
468 | - **Pretty Printing** (this doc)
|
469 | - will also use "span ID" to detect comment positions, I think
|
470 | - Syntax **highlighting**
|
471 | - Interactive highlighting will help users learn the language
|
472 | - It's **recursive**, because sublanguages are mutually recursive: string
|
473 | literals, commands, expressions
|
474 |
|
475 | Related: Retrospective on the Go `ast` package.
|
476 |
|
477 | (2) Yaks - *mycpp from the bottom up*
|
478 |
|
479 | NIL8 leads into **"Yaks"**, which is an IR for garbage collected C++.
|
480 |
|
481 | - Yaks is of course a big yak shave, but it's **concrete** because we recently
|
482 | **completed** the translation with [mycpp]($xref). (mycpp is a crappy
|
483 | program which produces good results!)
|
484 |
|
485 | (3) Pretty printing will cause many small **allocations**.
|
486 |
|
487 | I think that naive implementations should be fast enough. If not, any slowness
|
488 | may be due to allocation, not necessarily the pretty printing algorithm itself.
|
489 |
|
490 | Some solutions:
|
491 |
|
492 | - We want to move towards a **tagged pointer** runtime, an ambitious
|
493 | transformation of the entire interpreter.
|
494 | - But we'll do it in steps. First step: Small String Optimization.
|
495 | - Yaks goes along with a tagged pointer runtime. Yaks also has a principled
|
496 | IR, which may be represented with many small objects.
|
497 | - GC rooting optimization should give a speed boost
|
498 |
|
499 | ## Appendix: Older Notes
|
500 |
|
501 | This is about ref counting for printing **graph-shaped** data.
|
502 |
|
503 | I no longer think this is as important. I think we should **manually**
|
504 | construct 4 layers of the graph, as described in the section on formatter (4).
|
505 |
|
506 | ### Dynamically Typed YSH Data (`value_t`)
|
507 |
|
508 | Similar to JSON / JSON8 printing, except we
|
509 |
|
510 | 1. count references, and then print `...` instead of repeating
|
511 | 1. line wrap
|
512 | 1. assign colors
|
513 | - for atoms, and possibly for balanced parens, to make it more readable
|
514 |
|
515 | #### Step 1: Count References
|
516 |
|
517 | This is a global pass that computes a Dict[int, int]
|
518 |
|
519 | object ID -> number of times referenced in the graph
|
520 |
|
521 | The graph is specified by single root node, e.g. the argument to
|
522 |
|
523 | pp line (obj)
|
524 |
|
525 | Pass this dict into the second step.
|
526 |
|
527 | #### Step 2: Convert To Homogeneous Representation
|
528 |
|
529 | value.List -> hnode.Compound with []
|
530 | value.Dict -> hnode.Compound with {}
|
531 |
|
532 | null, true, false -> Atom
|
533 | Cycle detected -> Atom, with { --> 4beef2 }
|
534 | [ --> 4beef2 ]
|
535 |
|
536 | Repetition:
|
537 |
|
538 | { key: { ... 4beef2 }, key2: { ... 4beef2 }
|
539 |
|
540 | Or maybe omit the type, since strings don't have that:
|
541 |
|
542 | { key: ... 4beef2, key2: ... 4beef2 }
|
543 |
|
544 | #### hnode Schema
|
545 |
|
546 | The schema looks like this now?
|
547 |
|
548 | hnode =
|
549 | Atom(str s, color color) - # External objects can use this?
|
550 | | Compound(hnode* items)
|
551 |
|
552 | The length of 'str s' is the input to line wrapping.
|
553 |
|
554 | #### Step 3: Figure out what's on each line
|
555 |
|
556 | TODO: what's the heuristic here? Is it global?
|
557 |
|
558 | Dynamic programming?
|
559 |
|
560 | do we insert hnode.Newline() or something?
|
561 |
|
562 | ### Statically Typed ASDL Data
|
563 |
|
564 | Reduce it to the case above.
|
565 |
|
566 | #### Step 1 - Ref Counting / Cycle Detection?
|
567 |
|
568 | We do this all at once?
|
569 |
|
570 | Because to convert to value.Record, you have to do cycle detection anyway.
|
571 |
|
572 | And that's similar to ref counting.
|
573 |
|
574 | #### Step 2 - ASDL records -> value.Record
|
575 |
|
576 | value =
|
577 | ...
|
578 | | Record(str type_name, Dict[str, value_t] fields)
|
579 |
|
580 | The special "-" key can be used for JSON:
|
581 |
|
582 | {"-": "command.Simple, "name": "hi"}
|
583 |
|
584 | Though this loses some information, and it doesn't solve the problem with
|
585 | shared references. We would need Packle for that.
|
586 |
|
587 | #### Step 2a: Optional Abbreviation?
|
588 |
|
589 | Is this separate? Or part of step 2.
|
590 |
|
591 | We need something between value.Record and hnode.Compound
|
592 | to do ABBREVIATION:
|
593 |
|
594 | - Abbreviate type name, or omit it
|
595 | - Omit some field names (requires schema to record it)
|
596 | - Change () to <>
|
597 |
|
598 | Also need nodes for
|
599 |
|
600 | - `...` means already printed
|
601 | - `---` means CANNOT print, because it's a cycle
|
602 | - `@1f23` - ID if already printed, or in a cycle
|
603 |
|
604 | #### Steps 3 and 4 - Homogeneous Representation, Line Wrapping
|
605 |
|
606 | Identical to the dynamically typed case above.
|
607 |
|