OILS / doc / hay.md View on Github | oilshell.org

720 lines, 502 significant
1---
2default_highlighter: oils-sh
3---
4
5Hay - Custom Languages for Unix Systems
6=======================================
7
8*Hay* lets you use the syntax of the YSH to declare **data** and
9interleaved **code**. It allows the shell to better serve its role as
10essential **glue**. For example, these systems all combine Unix processes in
11various ways:
12
13- local build systems (Ninja, CMake, Debian package builds, Docker/OCI builds)
14- remote build services (VM-based continuous integration like sourcehut, Github
15 Actions)
16- local process supervisors (SysV init, systemd)
17- remote process supervisors / cluster managers (Slurm, Kubernetes)
18
19Slogans:
20
21- *Hay Ain't YAML*.
22 - It evaluates to [JSON][] + Shell Scripts.
23- *We need a better **control plane** language for the cloud*.
24- *YSH adds the missing declarative part to shell*.
25
26This doc describes how to use Hay, with motivating examples.
27
28As of 2022, this is a new feature of YSH, and **it needs user feedback**.
29Nothing is set in stone, so you can influence the language and its features!
30
31
32[JSON]: $xref:JSON
33
34<!--
35 - although also Tcl, Lua, Python, Ruby
36- DSLs, Config Files, and More
37- For Dialects of YSH
38
39Use case examples
40-->
41
42<!-- cmark.py expands this -->
43<div id="toc">
44</div>
45
46## Example
47
48Hay could be used to configure a hypothetical Linux package manager:
49
50 # cpython.hay -- A package definition
51
52 hay define Package/TASK # define a tree of Hay node types
53
54 Package cpython { # a node with attributes, and children
55
56 version = '3.9'
57 url = 'https://python.org'
58
59 TASK build { # a child node, with YSH code
60 ./configure
61 make
62 }
63 }
64
65This program evaluates to a JSON tree, which you can consume from programs in
66any language, including YSH:
67
68 { "type": "Package",
69 "args": [ "cpython" ],
70 "attrs": { "version": "3.9", "url": "https://python.org" },
71 "children": [
72 { "type": "TASK",
73 "args": [ "build" ],
74 "code_str": " ./configure\n make\n"
75 }
76 ]
77 }
78
79That is, a package manager can use the attributes to create a build
80environment, then execute shell code within it. This is a *staged evaluation
81model*.
82
83## Understanding Hay
84
85A goal of Hay is to restore the **simplicity** of Unix to distributed systems.
86It's all just **code and data**!
87
88This means that it's a bit abstract, so here are a few ways of understanding
89it.
90
91### Analogies
92
93The relation between Hay and YSH is like the relationship between these pairs
94of languages:
95
96- [YAML][] / [Go templates][], which are used in Helm config for Kubernetes.
97 - YAML data specifies a **service**, and templates specify **variants**.
98- Two common ways of building C and C++ code:
99 - [Make]($xref:make) / [Autotools]($xref:autotools)
100 - [Ninja]($xref:ninja) / [CMake][]
101 - Make and Ninja specify a **build graph**, while autotools and CMake detect
102 a **configured variant** with respect to your system.
103
104Each of these is *70's-style macro programming* &mdash; a stringly-typed
105language generating another stringly-typed language, with all the associated
106problems.
107
108In contrast, Hay and YSH are really the same language, with the same syntax,
109and the same Python- and JavaScript-like dynamic **types**. Hay is just YSH
110that **builds up data** instead of executing commands.
111
112(Counterpoint: Ninja is intended for code generation, and it makes sense for
113YSH to generate simple languages.)
114
115
116[Go templates]: https://pkg.go.dev/text/template
117[CMake]: https://cmake.org
118
119### Prior Art
120
121See the [Survey of Config Languages]($wiki) on the wiki, which puts them in
122these categories:
123
1241. Languages for String Data
125 - INI, XML, [YAML][], ...
1261. Languages for Typed Data
127 - [JSON][], TOML, ...
1281. Programmable String-ish Languages
129 - Go templates, CMake, autotools/m4, ...
1301. Programmable Typed Data
131 - Nix expressions, Starlark, Cue, ...
1321. Internal DSLs in General Purpose Languages
133 - Hay, Guile Scheme for Guix, Ruby blocks, ...
134
135Excerpts:
136
137[YAML][] is a data format that is (surprisingly) the de-facto control plane
138language for the cloud. It's an approximate superset of [JSON][].
139
140[UCL][] (universal config language) and [HCL][] (HashiCorp config language) are
141influenced by the [Nginx][] config file syntax. If you can read any of these
142languages, you can read Hay.
143
144[Nix][] has a [functional language][nix-lang] to configure Linux distros. In
145contrast, Hay is multi-paradigm and imperative.
146
147[nix-lang]: https://wiki.nixos.org/wiki/Nix_Expression_Language
148
149The [Starlark][] language is a dialect of Python used by the [Bazel][] build
150system. It uses imperative code to specify build graph variants, and you can
151use this same pattern in Hay. That is, if statements, for loops, and functions
152are useful in Starlark and Hay.
153
154[Ruby][]'s use of [first-class
155blocks](http://radar.oreilly.com/2014/04/make-magic-with-ruby-dsls.html)
156inspired YSH. They're used in systems like Vagrant (VM dev environments) and
157Rake (a build system).
158
159In [Lisp][], code and data are expressed with the same syntax, and can be
160interleaved.
161[G-Expressions](https://guix.gnu.org/manual/en/html_node/G_002dExpressions.html)
162in Guix use a *staged evaluation model*, like Hay.
163
164[YAML]: $xref:YAML
165[UCL]: https://github.com/vstakhov/libucl
166[Nginx]: https://en.wikipedia.org/wiki/Nginx
167[HCL]: https://github.com/hashicorp/hcl
168[Nix]: $xref:nix
169
170[Starlark]: https://github.com/bazelbuild/starlark
171[Bazel]: https://bazel.build/
172
173[Ruby]: https://www.ruby-lang.org/en/
174[Lisp]: https://en.wikipedia.org/wiki/Lisp_(programming_language)
175
176
177### Comparison
178
179The biggest difference between Hay and [UCL][] / [HCL][] is that it's
180**embedded in a shell**. In other words, Hay languages are *internal DSLs*,
181while those languages are *external*.
182
183This means:
184
1851. You can **interleave** shell code with Hay data. We'll discuss the many
186 uses of this below.
187 - On the other hand, it's OK to configure simple systems with plain data
188 like [JSON][]. Hay is for when that stops working!
1891. Hay isn't a library you embed in another program. Instead, you use
190 Unix-style **process-based** composition.
191 - For example, [HCL][] is written in Go, which may be hard to embed in a C
192 or Rust program.
193 - Note that a process is a good **security** boundary. It can be
194 additionally run in an OS container or VM.
195
196<!--
197 - Code on the **outside** of Hay blocks may use the ["staged programming" / "graph metaprogramming" pattern][build-ci-comments] mentioned above.
198 - Code on the **inside** is *unevaluated*. You can execute it in another
199 context, like a remote machine, Linux container, or virtual machine.
200-->
201
202The sections below elaborate on these points.
203
204[shell-pipelines]: https://www.oilshell.org/blog/2017/01/15.html
205
206<!--
207 - YSH has an imperative programming model. It's a little like Starlark.
208 - Guile / GNU Make.
209 - Tensorflow.
210-->
211
212
213## Overview
214
215Hay nodes have a regular structure:
216
217- They start with a "command", which is called the **type**.
218- They accept **string** arguments and **block** arguments. There must be at
219 least one argument.
220
221### Two Kinds of Nodes, and Three Kinds of Evaluation
222
223There are two kinds of node with this structure.
224
225(1) `SHELL` nodes contain **unevaluated** code, and their type is ALL CAPS.
226The code is turned into a string that can be executed elsewhere.
227
228 TASK build {
229 ./configure
230 make
231 }
232 # =>
233 # ... {"code_str": " ./configure\n make\n"}
234
235(2) `Attr` nodes contain **data**, and their type starts with a capital letter.
236They eagerly evaluate a block in a new **stack frame** and turn it into an
237**attributes dict**.
238
239 Package cpython {
240 version = '3.9'
241 }
242 # =>
243 # ... {"attrs": {"version": "3.9"}} ...
244
245These blocks have a special rule to allow *bare assignments* like `version =
246'3.9'`. That is, you don't need keywords like `const` or `var`.
247
248(3) In contrast to these two types of Hay nodes, YSH builtins that take a block
249usually evaluate it eagerly:
250
251 cd /tmp { # run in a new directory
252 echo $PWD
253 }
254
255Builtins are spelled with `lower` case letters, so `SHELL` and `Attr` nodes
256won't be confused with them.
257
258### Two Stages of Evaluation
259
260So Hay is designed to be used with a *staged evaluation model*:
261
2621. The first stage follows the rules above:
263 - Tree of Hay nodes &rarr; [JSON]($xref) + Unevaluated shell.
264 - You can use variables, conditionals, loops, and more.
2652. Your app or system controls the second stage. You can invoke YSH again to
266 execute shell inside a VM, inside a Linux container, or on a remote machine.
267
268These two stages conceptually different, but use the **same** syntax and
269evaluator! Again, the evaluator runs in a mode where it **builds up data**
270rather than executing commands.
271
272### Result Schema
273
274Here's a description of the result of Hay evaluation (the first stage).
275
276 # The source may be "cpython.hay"
277 FileResult = (source Str, children List[NodeResult])
278
279 NodeResult =
280 # package cpython { version = '3.9' }
281 Attr (type Str,
282 args List[Str],
283 attrs Map[Str, Any],
284 children List[NodeResult])
285
286 # TASK build { ./configure; make }
287 | Shell(type Str,
288 args List[Str],
289 location_str Str,
290 location_start_line Int,
291 code_str Str)
292
293
294Notes:
295
296- Except for user-defined attributes, the result is statically typed.
297- Shell nodes are always leaf nodes.
298- Attr nodes may or may not be leaf nodes.
299
300## Three Ways to Invoke Hay
301
302### Inline Hay Has No Restrictions
303
304You can put Hay blocks and normal shell code in the same file. Retrieve the
305result of Hay evaluation with the `_hay()` function.
306
307 # myscript.ysh
308
309 hay define Rule
310
311 Rule mylib.o {
312 inputs = ['mylib.c']
313
314 # not recommended, but allowed
315 echo 'hi'
316 ls /tmp/$(whoami)
317 }
318
319 echo 'bye' # other shell code
320
321 const result = _hay()
322 json write (result)
323
324In this case, there are no restrictions on the commands you can run.
325
326### In Separate Files
327
328You can put hay definitions in their own file:
329
330 # my-config.hay
331
332 Rule mylib.o {
333 inputs = ['mylib.c']
334 }
335
336 echo 'hi' # allowed for debugging
337 # ls /tmp/$(whoami) would fail due to restrictions on hay evaluation
338
339In this case, you can use `echo` and `write`, but the interpreted is
340**restricted** (see below).
341
342Parse it with `parseHay()`, and evaluate it with `evalHay()`:
343
344 # my-evaluator.ysh
345
346 hay define Rule # node types for the file
347 const h = parseHay('build.hay')
348 const result = evalHay(h)
349
350 json write (result)
351 # =>
352 # {
353 # "children": [
354 # { "type": "Rule",
355 # "args": ["mylib.o"],
356 # "attrs": {"inputs": ["mylib.c"]}
357 # }
358 # ]
359 # }
360
361### In A Block
362
363Instead of creating separate files, you can also use the `hay eval` builtin:
364
365 hay define Rule
366
367 hay eval :result { # assign to the variable 'result'
368 Rule mylib.o {
369 inputs = ['mylib.c']
370 }
371 }
372
373 json write (result) # same as above
374
375This is mainly for testing and demos.
376
377## Security Model: Restricted != Sandboxed
378
379The "restrictions" are **not** a security boundary! (They could be, but we're
380not making promises now.)
381
382Even with `evalHay()` and `hay eval`, the config file is evaluated in the
383**same interpreter**. But the following restrictions apply:
384
385- External commands aren't allowed
386- Builtins other than `echo` and `write` aren't allowed
387 - For example, the `.hay` file can't invoke `shopt` to change global shell
388 options
389- A new stack frame is created, so the `.hay` file can't mutate your locals
390 - However it can still mutate globals with `setglobal`!
391
392In summary, Hay evaluation is restricted to prevent basic mistakes, but your
393code isn't completely separate from the evaluated Hay file.
394
395If you want to evaluate untrusted code, use a **separate process**, and run it
396in a container or VM.
397
398## Reference
399
400Here is a list of all the mechanisms mentioned.
401
402### Shell Builtins
403
404- `hay`
405 - `hay define` to define node types.
406 - `hay pp` to pretty print the node types.
407 - `hay reset` to delete both the node types **and** the current evaluation
408 result.
409 - `hay eval :result { ... }` to evaluate in restricted mode, and put the
410 result in a variable.
411- Implementation detail: the `haynode` builtin is run when types like
412 `Package` and `TASK` are invoked. That is, all node types are aliases for
413 this same builtin.
414
415### Functions
416
417- `parseHay()` parses a file, just as `bin/ysh` does.
418- `evalHay()` evaluates the parsed file in restricted mode, like `hay eval`.
419- `_hay()` retrieves the current result
420 - It's useful interactive debugging.
421 - The name starts with `_` because it's a "register" mutated by the
422 interpreter.
423
424### Options
425
426Hay is parsed and evaluated with option group `ysh:all`, which includes
427`parse_proc` and `parse_equals`.
428
429<!--
430
431- The `parse_brace` and `parse_equals` options are what let us inside attribute nodes
432- `_running_hay`
433
434-->
435
436
437## Usage: Interleaving Hay and YSH
438
439Why would you want to interleave data and code? One reason is to naturally
440express variants of a configuration. Here are some examples.
441
442**Build variants**. There are many variants of the YSH binary:
443
444- `dbg` and `opt`. the compiler optimization level, and whether debug symbols
445 are included.
446- `asan` and `ubsan`. Dynamic analysis with Clang sanitizers.
447- `-D GC_EVERY_ALLOC`. Make a build that helps debug the garbage collector.
448
449So the Ninja build graph to produce these binaries is **shaped** similarly, but
450it **varies** with compiler and linker flags.
451
452**Service variants**. A common problem in distributed systems is how to
453develop and debug services locally.
454
455Do your service dependencies live in the cloud, or are they run locally? What
456about state? Common variants:
457
458- `local`. Part or all of the service runs locally, so you may pass flags like
459 `--auth-service localhost:8001` to binaries.
460- `staging`. A complete copy of the service, in a different cloud, with a
461 different database.
462- `prod`. The live instance running with user data.
463
464Again, these collections of services are all **shaped** similarly, but the
465flags **vary** based on where binaries are physically running.
466
467---
468
469This model can be referred to as ["graph metaprogramming" or "staged
470programming"][build-ci-comments]. In YSH, it's done with dynamically typed
471data like integers and dictionaries. In contrast, systems like CMake and
472autotools are more stringly typed.
473
474[build-ci-comments]: https://www.oilshell.org/blog/2021/04/build-ci-comments.html
475
476The following **examples** are meant to be "evocative"; they're not based on
477real code. Again, user feedback can improve them!
478
479### Conditionals
480
481Conditionals can go on the inside of a block:
482
483 Service auth.example.com { # node taking a block
484 if (variant === 'local') { # condition
485 port = 8001
486 } else {
487 port = 80
488 }
489 }
490
491Or on the outside:
492
493 Service web { # node
494 root = '/home/www'
495 }
496
497 if (variant === 'local') { # condition
498 Service auth-local { # node
499 port = 8001
500 }
501 }
502
503
504### Iteration
505
506Iteration can also go on the inside of a block:
507
508 Rule foo.o { # node
509 inputs = [] # populate with all .cc files except one
510
511 # variables ending with _ are "hidden" from block evaluation
512 for name_ in *.cc {
513 if name_ !== 'skipped.cc' {
514 call inputs->append(name_)
515 }
516 }
517 }
518
519Or on the outside:
520
521 for name_ in *.cc { # loop
522 Rule $(basename $name_ .cc).o { # node
523 inputs = [name_]
524 }
525 }
526
527
528### Remove Duplication with `proc`
529
530Procs can wrap blocks:
531
532 proc myrule(name) {
533
534 # needed for blocks to use variables higher on the stack
535 shopt --set dynamic_scope {
536
537 Rule dbg/$name.o { # node
538 inputs = ["$name.c"]
539 flags = ['-O0']
540 }
541
542 Rule opt/$name.o { # node
543 inputs = ["$name.c"]
544 flags = ['-O2']
545 }
546
547 }
548 }
549
550 myrule foo # call proc
551 myrule bar # call proc
552
553Or they can be invoked from within blocks:
554
555 proc set-port (port_num; out) {
556 call out->setValue("localhost:$port_num")
557 }
558
559 Service foo { # node
560 set-port 80 :p1 # call proc
561 set-port 81 :p2 # call proc
562 }
563
564## More Usage Patterns
565
566### Using YSH for the Second Stage
567
568The general pattern is:
569
570 ./my-evaluator.ysh my-config.hay | json read :result
571
572The evaluator does the following:
573
5741. Sets up the execution context with `hay define`
5751. Parses `my-config.hay` with `parseHay()`
5761. Evaluates it with `evalHay()`
5771. Prints the result as JSON.
578
579Then a separate YSH processes reads this JSON and executes application code.
580
581TODO: Show code example.
582
583### Using Python for the Second Stage
584
585In Python, you would:
586
5871. Use the `subprocess` module to invoke `./my-evaluator.ysh my-config.hay`.
5882. Use the `json` module to parse the result.
5893. Then execute application code using the data.
590
591TODO: Show code example.
592
593### Locating Errors in the Original `.hay` File
594
595The YSH interpreter has 2 flags starting with `--location` that give you
596control over error messages.
597
598 ysh --location-str 'foo.hay' --location-start-line 42 -- stage2.ysh
599
600Set them to the values of fields `location_str` and `location_start_line` in
601the result of `SHELL` node evaluation.
602
603### Debian `.d` Dirs
604
605Debian has a pattern of splitting configuration into a **directory** of
606concatenated files. It's easier for shell scripts to add to a directory than
607add to a file.
608
609This can be done with an evaluator that simply enumerates all files:
610
611 var results = []
612 for path in myconfig.d/*.hay {
613 const code = parseHay(path)
614 const result = eval(hay)
615 call results->append(result)
616 }
617
618 # Now iterate through results
619
620### Parallel Loading
621
622TODO: Example of using `xargs -P` to spawn processes with `parseHay()` and
623`evalHay()`. Then merge the JSON results.
624
625## Style
626
627### Attributes vs. Procs
628
629Assigning attributes and invoking procs can look similar:
630
631 Package grep {
632 version = '1.0' # An attribute?
633
634 version 1.0 # or call proc 'version'?
635 }
636
637The first style is better for typed data like integers and dictionaries. The
638latter style isn't useful here, but it could be if `version 1.0` created
639complex Hay nodes.
640
641### Attributes vs. Flags
642
643Hay nodes shouldn't take flags or `--`. Flags are for key-value pairs, and
644blocks are better for expressing such data.
645
646No:
647
648 Package --version 1.0 grep {
649 license = 'GPL'
650 }
651
652Yes:
653
654 Package grep {
655 version = '1.0'
656 license = 'GPL'
657 }
658
659### Dicts vs. Blocks
660
661Superficially, dicts and blocks are similar:
662
663 Package grep {
664 mydict = {name: 'value'} # a dict
665
666 mynode foo { # a node taking a block
667 name = 'value'
668 }
669 }
670
671Use dicts in cases where you don't know the names or types up front, like
672
673 files = {'README.md': true, '__init__.py': false}
674
675Use blocks when there's a **schema**. Blocks are also different because:
676
677- You can use `if` statements and `for` loops in them.
678- You can call `TASK build; TASK test` within a block, creating multiple
679 objects of the same type.
680- Later: custom validation
681
682### YSH vs. Shell
683
684Hay files are parsed as YSH, not OSH. That includes `SHELL` nodes:
685
686 TASK build {
687 cp @deps /tmp # YSH splicing syntax
688 }
689
690If you want to use POSIX shell or bash, use two arguments, the second of which
691is a multi-line string:
692
693 TASK build '''
694 cp "${deps[@]}" /tmp
695 '''
696
697The YSH style gives you *static parsing*, which catches some errors earlier.
698
699## Future Work
700
701- `hay proc` for arbitrary schema validation, including JSON schema
702- Examples of running hay in a secure process / container, in various languages
703- Sandboxing:
704 - More find-grained rules?
705 - "restricted" could come with a security guarantee. I've avoided making
706 such guarantees, but I think it's possible as YSH matures. The
707 interpreter uses dependency inversion to isolate I/O.
708- More location info, including the source file.
709
710[Please send
711feedback](https://github.com/oilshell/oil/wiki/Where-To-Send-Feedback) about
712Hay. It will inform and prioritize this work!
713
714## Links
715
716- Blog posts tagged #[hay]($blog-tag). Hay is a general mechanism, so it's
717 useful to explain it with concrete examples.
718- [Data Definition and Code Generation in Tcl](https://trs.jpl.nasa.gov/bitstream/handle/2014/7660/03-1728.pdf) (2003, PDF)
719 - Like Hay, it has the (Type, Name, Attributes) data model.
720- <https://github.com/oilshell/oil/wiki/Config-Dialect>. Design notes and related links on the wiki.