doc/ysh-tour.md

OILS / doc / ysh-tour.md View on Github | oilshell.org

1401 lines, 973 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	A Tour of YSH
6	=============
7
8	<!-- author's note about example names
9
10	- people: alice, bob
11	- nouns: ale, bean
12	- peanut, coconut
13	- 42 for integers
14	-->
15
16	This doc describes the [YSH]($xref) language from clean slate
17	perspective. We don't assume you know Unix shell, or the compatible
18	[OSH]($xref). But shell users will see the similarity, with simplifications
19	and upgrades.
20
21	Remember, YSH is for Python and JavaScript users who avoid shell! See the
22	[project FAQ][FAQ] for more color on that.
23
24	[FAQ]: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
25	[path dependence]: https://en.wikipedia.org/wiki/Path_dependence
26
27	This document is long because it demonstrates nearly every feature of the
28	language. You may want to read it in multiple sittings, or read [The Simplest
29	Explanation of
30	Oil](https://www.oilshell.org/blog/2020/01/simplest-explanation.html) first.
31	(Until 2023, YSH was called the "Oil language".)
32
33
34	Here's a summary of what follows:
35
36	1. YSH has interleaved word, command, and expression languages.
37	- The command language has Ruby-like blocks, and the expression language
38	has Python-like data types.
39	2. YSH has both builtin commands like `cd /tmp`, and builtin functions like
40	`join()`.
41	3. Languages for data, like [JSON][], are complementary to YSH code.
42	4. OSH and YSH share both an interpreter data model and a process model
43	(provided by the Unix kernel). Understanding these common models will make
44	you both a better shell user and YSH user.
45
46	Keep these points in mind as you read the details below.
47
48	[JSON]: https://json.org
49
50	<div id="toc">
51	</div>
52
53	## Preliminaries
54
55	Start YSH just like you start bash or Python:
56
57	<!-- oils-sh below skips code block extraction, since it doesn't run -->
58
59	```sh-prompt
60	bash$ ysh # assuming it's installed
61
62	ysh$ echo 'hello world' # command typed into YSH
63	hello world
64	```
65
66	In the sections below, we'll save space by showing output in comments, with
67	`=>`:
68
69	echo 'hello world' # => hello world
70
71	Multi-line output is shown like this:
72
73	echo one
74	echo two
75	# =>
76	# one
77	# two
78
79	## Examples
80
81	### Hello World Script
82
83	You can also type commands into a file like `hello.ysh`. This is a complete
84	YSH program, which is identical to a shell program:
85
86	echo 'hello world' # => hello world
87
88	### A Taste of YSH
89
90	Unlike shell, YSH has `var` and `const` keywords:
91
92	const name = 'world' # const is rarer, used the top-level
93	echo "hello $name" # => hello world
94
95	They take rich Python-like expressions on the right:
96
97	var x = 42 # an integer, not a string
98	setvar x = x * 2 + 1 # mutate with the 'setvar' keyword
99
100	setvar x += 5 # Increment by 5
101	echo $x # => 6
102
103	var mylist = [x, 7] # two integers [6, 7]
104
105	Expressions are often surrounded by `()`:
106
107	if (x > 0) {
108	echo 'positive'
109	} # => positive
110
111	for i, item in (mylist) { # 'mylist' is a variable, not a string
112	echo "[$i] item $item"
113	}
114	# =>
115	# [0] item 6
116	# [1] item 7
117
118	YSH has Ruby-like blocks:
119
120	cd /tmp {
121	echo hi > greeting.txt # file created inside /tmp
122	echo $PWD # => /tmp
123	}
124	echo $PWD # prints the original directory
125
126	And utilities to read and write JSON:
127
128	var person = {name: 'bob', age: 42}
129	json write (person)
130	# =>
131	# {
132	# "name": "bob",
133	# "age": 42,
134	# }
135
136	echo '["str", 42]' \| json read # sets '_reply' variable by default
137
138	The `=` keyword evaluates and prints an expression:
139
140	= _reply
141	# => (List) ["str", 42]
142
143	(Think of it like `var x = _reply`, without the `var`.)
144
145	## Word Language: Expressions for Strings (and Arrays)
146
147	Let's describe the word language first, and then talk about commands and
148	expressions. Words are a rich language because strings are a central
149	concept in shell.
150
151	### Three Kinds of String Literals
152
153	You can choose the quoting style that's most convenient to write a given
154	string.
155
156	#### Double-Quoted, Single-Quoted, and J8 strings (like JSON)
157
158	Double-quoted strings allow interpolation with `$`:
159
160	var person = 'alice'
161	echo "hi $person, $(echo bye)" # => hi alice, bye
162
163	Write operators by escaping them with `\`:
164
165	echo "\$ \" \\ " # => $ " \
166
167	In single-quoted strings, all characters are literal (except `'`, which
168	can't be expressed):
169
170	echo 'c:\Program Files\' # => c:\Program Files\
171
172	If you want C-style backslash character escapes, use a J8 string, which is
173	like JSON, but with single quotes::
174
175	echo u' A is \u{41} \n line two, with backslash \\'
176	# =>
177	# A is A
178	# line two, with backslash \
179
180	The `u''` strings are guaranteed to be valid Unicode (unlike JSON), but you can
181	also use `b''` strings:
182
183	echo b'byte \yff' # byte that's not valid unicode, like \xff in other languages
184	# do not confuse with \u{ff}
185
186	#### Multi-line Strings
187
188	Multi-line strings are surrounded with triple quotes. They come in the same
189	three varieties, and leading whitespace is stripped in a convenient way.
190
191	sort <<< """
192	var sub: $x
193	command sub: $(echo hi)
194	expression sub: $[x + 3]
195	"""
196	# =>
197	# command sub: hi
198	# expression sub: 9
199	# var sub: 6
200
201	sort <<< '''
202	$2.00 # literal $, no interpolation
203	$1.99
204	'''
205	# =>
206	# $1.99
207	# $2.00
208
209	sort <<< u'''
210	C\tD
211	A\tB
212	''' # b''' strings also supported
213	# =>
214	# A B
215	# C D
216
217	(Use multiline strings instead of shell's [here docs]($xref:here-doc).)
218
219	### Three Kinds of Substitution
220
221	YSH has syntax for 3 types of substitution, all of which start with `$`. These
222	things can all be converted to a string:
223
224	1. Variables
225	2. The output of commands
226	3. The value of expressions
227
228	#### Variable Sub
229
230	The syntax `$a` or `${a}` converts a variable to a string:
231
232	var a = 'ale'
233	echo $a # => ale
234	echo _${a}_ # => _ale_
235	echo "_ $a _" # => _ ale _
236
237	The shell operator `:-` is occasionally useful in YSH:
238
239	echo ${not_defined:-'default'} # => default
240
241	#### Command Sub
242
243	The `$(echo hi)` syntax runs a command and captures its `stdout`:
244
245	echo $(hostname) # => example.com
246	echo "_ $(hostname) _" # => _ example.com _
247
248	#### Expression Sub
249
250	The `$[myexpr]` syntax evaluates an expression and converts it to a string:
251
252	echo $[a] # => ale
253	echo $[1 + 2 * 3] # => 7
254	echo "_ $[1 + 2 * 3] _" # => _ 7 _
255
256	<!-- TODO: safe substitution with "$[a]"html -->
257
258	### Arrays of Strings: Globs, Brace Expansion, Splicing, and Splitting
259
260	There are four constructs that evaluate to an list of strings, rather than
261	a single string.
262
263	#### Globs
264
265	Globs like `*.py` evaluate to a list of files.
266
267	touch foo.py bar.py # create the files
268	write *.py
269	# =>
270	# foo.py
271	# bar.py
272
273	If no files match, it evaluates to an empty list (`[]`).
274
275	#### Brace Expansion
276
277	The brace expansion mini-language lets you write strings without duplication:
278
279	write {alice,bob}@example.com
280	# =>
281	# alice@example.com
282	# bob@example.com
283
284	#### Splicing
285
286	The `@` operator splices an array into a command:
287
288	var myarray = :\| ale bean \|
289	write S @myarray E
290	# =>
291	# S
292	# ale
293	# bean
294	# E
295
296	You also have `@[]` to splice an expression that evaluates to a list:
297
298	write -- @[split('ale bean')]
299	# =>
300	# ale
301	# bean
302
303	Each item will be converted to a string.
304
305	#### Split Command Sub / Split Builtin Sub
306
307	There's also a variant of command sub that splits first:
308
309	write @(seq 3) # write gets 3 arguments
310	# =>
311	# 1
312	# 2
313	# 3
314
315	<!-- TODO: This should decode J8 notation, which includes "" j"" and b"" -->
316
317	## Command Language: I/O, Control Flow, Abstraction
318
319	### Simple Commands and Redirects
320
321	A simple command is a space-separated list of words, which are often unquoted.
322	YSH looks up the first word to determine if it's a `proc` or shell builtin.
323
324	echo 'hello world' # The shell builtin 'echo'
325
326	proc greet (name) { # A proc is like a procedure or process
327	echo "hello $name"
328	}
329
330	# Now the first word will resolve to the proc
331	greet alice # => hello alice
332
333	If it's neither, then it's assumed to be an external command:
334
335	ls -l /tmp # The external 'ls' command
336
337	Commands accept traditional string arguments, as well as typed arguments in
338	parentheses:
339
340	# 'write' is a string arg; 'x' is a typed expression arg
341	json write (x)
342
343	You can redirect `stdin` and `stdout` of simple commands:
344
345	echo hi > tmp.txt # write to a file
346	sort < tmp.txt
347
348	Idioms for using stderr (identical to shell):
349
350	ls /tmp 2>errors.txt
351	echo 'fatal error' 1>&2
352
353	"Simple" commands in YSH can also have typed `()` and block `{}` args, which
354	we'll see in the section on "procs".
355
356	### Pipelines
357
358	Pipelines are a powerful method manipulating data streams:
359
360	ls \| wc -l # count files in this directory
361	find /bin -type f \| xargs wc -l # count files in a subtree
362
363	The stream may contain (lines of) text, binary data, JSON, TSV, and more.
364	Details below.
365
366	### Multi-line Commands
367
368	The YSH `...` prefix lets you write long commands, pipelines, and `&&` chains
369	without `\` line continuations.
370
371	... find /bin # traverse this directory and
372	-type f -a -executable # print executable files
373	\| sort -r # reverse sort
374	\| head -n 30 # limit to 30 files
375	;
376
377	When this mode is active:
378
379	- A single newline behaves like a space
380	- A blank line (two newlines in a row) is illegal, but a line that has only a
381	comment is allowed. This prevents confusion if you forget the `;`
382	terminator.
383
384	### `var`, `setvar`, `const` to Declare and Mutate
385
386	Constants can't be modified:
387
388	const myconst = 'mystr'
389	# setvar myconst = 'foo' would be an error
390
391	Modify variables with the `setvar` keyword:
392
393	var num_beans = 12
394	setvar num_beans = 13
395
396	A more complex example:
397
398	var d = {name: 'bob', age: 42} # dict literal
399	setvar d.name = 'alice' # d.name is a synonym for d['name']
400	echo $[d.name] # => alice
401
402	That's most of what you need to know about assignments. Advanced users may
403	want to use `setglobal` or `call myplace->setValue(42)` in certain situations.
404
405	<!--
406	var g = 1
407	var h = 2
408	proc demo(:out) {
409	setglobal g = 42
410	setref out = 43
411	}
412	demo :h # pass a reference to h
413	echo "$g $h" # => 42 43
414	-->
415
416	More details: [Variable Declaration and Mutation](variables.html).
417
418	### `for` Loop
419
420	Shell-style for loops iterate over words:
421
422	for word in 'oils' $num_beans {pea,coco}nut {
423	echo $word
424	}
425	# =>
426	# oils
427	# 13
428	# peanut
429	# coconut
430
431	You can also request the loop index:
432
433	for i, word in README.md *.py {
434	echo "$i - $word"
435	}
436	# =>
437	# 0 - README.md
438	# 1 - __init__.py
439
440	To iterate over a typed data, use parentheses around an expression. The
441	expression should evaluate to an integer range, `List`, `Dict`, or `Str`
442	(TODO).
443
444	for i in (3 .. 5) { # range operator ..
445	echo "i = $i"
446	}
447	# =>
448	# i = 3
449	# i = 4
450
451	List:
452
453	var foods = ['ale', 'bean']
454	for item in (foods) {
455	echo $item
456	}
457	# =>
458	# ale
459	# bean
460
461	Again you can request the index:
462
463	for i, item in (foods) {
464	echo "$i - $item"
465	}
466	# =>
467	# 0 - ale
468	# 1 - bean
469
470	Likewise, here's the most general form of the dictionary loop:
471
472	var mydict = {pea: 42, nut: 10}
473	for i, k, v in (mydict) {
474	echo "$i - $k - $v"
475	}
476	# =>
477	# 0 - pea - 42
478	# 1 - nut - 10
479
480	There are two simpler forms:
481
482	- One variable gives you the key: `for k in (mydict)`
483	- Two variables gives you the key and value: `for k, v in (mydict)`
484
485	(One way to think of it: `for` loops in YSH have the functionality Python's
486	`enumerate()`, `items()`, `keys()`, and `values()`.)
487
488	<!--
489	TODO: Str loop should give you the (UTF-8 offset, rune)
490	Or maybe just UTF-8 offset? Decoding errors could be exceptions, or Unicode
491	replacement.
492	-->
493
494	### `while` Loop
495
496	While loops can use a command as the termination condition:
497
498	while test --file lock {
499	sleep 1
500	}
501
502	Or an expression, which is surrounded in `()`:
503
504	var i = 3
505	while (i < 6) {
506	echo "i = $i"
507	setvar i += 1
508	}
509	# =>
510	# i = 3
511	# i = 4
512	# i = 5
513
514	### `if elif` Conditional
515
516	If statements test the exit code of a command, and have optional `elif` and
517	`else` clauses:
518
519	if test --file foo {
520	echo 'foo is a file'
521	rm --verbose foo # delete it
522	} elif test --dir foo {
523	echo 'foo is a directory'
524	} else {
525	echo 'neither'
526	}
527
528	Invert the exit code with `!`:
529
530	if ! grep alice /etc/passwd {
531	echo 'alice is not a user'
532	}
533
534	As with `while` loops, the condition can also be an expression wrapped in
535	`()`:
536
537	if (num_beans > 0) {
538	echo 'so many beans'
539	}
540
541	var done = false
542	if (not done) { # negate with 'not' operator (contrast with !)
543	echo "we aren't done"
544	}
545
546	### `case` Conditional
547
548	The case statement is a series of conditionals and executable blocks. The
549	condition can be either an unquoted glob pattern like `*.py`, an eggex pattern
550	like `/d+/`, or a typed expression like `(42)`:
551
552	var s = 'README.md'
553	case (s) {
554	*.py { echo 'Python' }
555	.cc \| .h { echo 'C++' }
556	* { echo 'Other' }
557	}
558	# => Other
559
560	case (s) {
561	/ dot* '.md' / { echo 'Markdown' }
562	(30 + 12) { echo 'the integer 42' }
563	(else) { echo 'neither' }
564	}
565	# => Markdown
566
567	<!-- TODO: document case on typed data -->
568
569	(Shell style like `if foo; then ... fi` and `case $x in ... esac` is also legal,
570	but discouraged in YSH code.)
571
572	### Error Handling
573
574	If statements are also used for error handling. Builtins and external
575	commands use this style:
576
577	if ! test -d /bin {
578	echo 'not a directory'
579	}
580
581	if ! cp foo /tmp {
582	echo 'error copying' # any non-zero status
583	}
584
585	Procs use this style (because of shell's disabled `errexit` quirk):
586
587	try {
588	myproc
589	}
590	if (_status !== 0) {
591	echo 'failed'
592	}
593
594	For a complete list of examples, see [YSH vs. Shell Idioms > Error
595	Handling](idioms.html#error-handling). For design goals and a reference, see
596	[YSH Fixes Shell's Error Handling](error-handling.html).
597
598	#### `break`, `continue`, `return`, `exit`
599
600	The `exit` keyword exits a process (it's not a shell builtin.) The other 3
601	control flow keywords behave like they do in Python and JavaScript.
602
603	### Ruby-like Blocks
604
605	Here's a builtin command that takes a literal block argument:
606
607	shopt --unset errexit { # ignore errors
608	cp ale /tmp
609	cp bean /bin
610	}
611
612	Blocks are a special kind of typed argument passed to commands like `shopt`.
613	Their type is `value.Command`.
614
615	### Shell-like `proc`
616
617	You can define units of code with the `proc` keyword.
618
619	proc mycopy (src, dest) {
620	### Copy verbosely
621
622	mkdir -p $dest
623	cp --verbose $src $dest
624	}
625
626	The `###` line is a "doc comment", and can be retrieved with `pp proc`. Simple
627	procs like this are invoked like a shell command:
628
629	touch log.txt
630	mycopy log.txt /tmp # first word 'mycopy' is a proc
631
632	Procs have more features, including four kinds of arguments:
633
634	1. Word args (which are always strings)
635	1. Typed, positional args (aka positional args)
636	1. Typed, named args (aka named args)
637	1. A final block argument, which may be written with `{ }`.
638
639	At the call site, they can look like any of these forms:
640
641	cd /tmp # word arg
642
643	json write (d) # word arg, then positional arg
644
645	# error 'failed' (status=9) # word arg, then named arg
646
647	cd /tmp { echo $PWD } # word arg, then block arg
648
649	var mycmd = ^(echo hi) # expression for a value.Command
650	eval (mycmd) # positional arg
651
652	<!-- TODO: lazy arg list: ls8 \| where [age > 10] -->
653
654	At the definition site, the kinds of parameters are separated with `;`, similar
655	to the Julia language:
656
657	proc p2 (word1, word2; pos1, pos2, ...rest_pos) {
658	echo "$word1 $word2 $[pos1 + pos2]"
659	json write (rest_pos)
660	}
661
662	proc p3 (w ; ; named1, named2, ...rest_named; block) {
663	echo "$w $[named1 + named2]"
664	eval (block)
665	json write (rest_named)
666	}
667
668	proc p4 (; ; ; block) {
669	eval (block)
670	}
671
672	YSH also has Python-like functions defined with `func`. These are part of the
673	expression language, which we'll see later.
674
675	For more info, see the [Informal Guide to Procs and Funcs](proc-func.html)
676	(under construction).
677
678	#### Builtin Commands
679
680	Shell builtins like `cd` and `read` are the "standard library" of the
681	command language. Each one takes various flags:
682
683	cd -L . # follow symlinks
684
685	echo foo \| read --all # read all of stdin
686
687	Here are some categories of builtin:
688
689	- I/O: `echo write read`
690	- File system: `cd test`
691	- Processes: `fork wait forkwait exec`
692	- Interpreter settings: `shopt shvar`
693	- Meta: `command builtin runproc type eval`
694	- Modules: `source module`
695
696	<!-- TODO: Link to a comprehensive list of builtins -->
697
698	## Expression Language: Python-like Types
699
700	YSH expressions look and behave more like Python or JavaScript than shell. For
701	example, we write `if (x < y)` instead of `if [ $x -lt $y ]`. Expressions are
702	usually surrounded by `( )`.
703
704	At runtime, variables like `x` and `y` are bounded to typed data, like
705	integers, floats, strings, lists, and dicts.
706
707	<!--
708	[Command vs. Expression Mode](command-vs-expression-mode.html) may help you
709	understand how YSH is parsed.
710	-->
711
712	### Python-like `func`
713
714	At the end of the Command Language, we saw that procs are shell-like units of
715	code. Now let's talk about Python-like functions in YSH, which are
716	different than `procs`:
717
718	- They're defined with the `func` keyword.
719	- They're called in expressions, not in commands.
720	- They're pure, and live in the interior of a process.
721	- In contrast, procs usually perform I/O, and have exterior boundaries.
722
723	Here's a function that mutates its argument:
724
725	func popTwice(mylist) {
726	call mylist->pop()
727	call mylist->pop()
728	}
729
730	var mylist = [3, 4]
731
732	# The call keyword is an "adapter" between commands and expressions,
733	# like the = keyword.
734	call popTwice(mylist)
735
736	Here's a pure function:
737
738	func myRepeat(s, n; special=false) { # positional; named params
739	var parts = []
740	for i in (0 .. n) {
741	append $s (parts)
742	}
743	var result = join(parts)
744
745	if (special) {
746	return ("$result !!") # parens required for typed return
747	} else {
748	return (result)
749	}
750	}
751
752	echo $[myRepeat('z', 3)] # => zzz
753
754	echo $[myRepeat('z', 3, special=true)] # => zzz !!
755
756	Funcs are named using `camelCase`, while procs use `kebab-case`. See the
757	[Style Guide](style-guide.html) for more conventions.
758
759	#### Builtin Functions
760
761	In addition, to builtin commands, YSH has Python-like builtin functions.
762	These are like the "standard library" for the expression language. Examples:
763
764	- Functions that take multiple types: `len() type()`
765	- Conversions: `bool() int() float() str() list() ...`
766	- Explicit word evaluation: `split() join() glob() maybe()`
767
768	<!-- TODO: Make a comprehensive list of func builtins. -->
769
770
771	### Data Types: `Int`, `Str`, `List`, `Dict`, ...
772
773	YSH has data types, each with an expression syntax and associated methods.
774
775	### Methods
776
777	Mutating methods are looked up with a thin arrow `->`:
778
779	var foods = ['ale', 'bean']
780	var last = foods->pop() # bean
781	write @foods # => ale
782
783	You can ignore the return value with the `call` keyword:
784
785	call foods->pop()
786
787	Transforming methods use a fat arrow `=>`:
788
789	var line = ' ale bean '
790	var trimmed = line => trim() => upper() # 'ALE BEAN'
791
792	If the `=>` operator doesn't find a method with the given name in the object's
793	type, it looks for free functions:
794
795	# list() is a free function taking one arg
796	# join() is a free function taking two args
797	var x = {k1: 42, k2: 43} => list() => join('/') # 'K1/K2'
798
799	This allows a left-to-right "method chaining" style.
800
801	---
802
803	Now let's go through the data types in YSH. We'll show the syntax for
804	literals, and what methods they have.
805
806	#### Null and Bool
807
808	YSH uses JavaScript-like spellings these three "atoms":
809
810	var x = null
811
812	var b1, b2 = true, false
813
814	if (b1) {
815	echo 'yes'
816	} # => yes
817
818
819	#### Int
820
821	There are many ways to write integers:
822
823	var small, big = 42, 65_536
824	echo "$small $big" # => 42 65536
825
826	var hex, octal, binary = 0x0001_0000, 0o755, 0b0001_0101
827	echo "$hex $octal $binary" # => 65536 493 21
828
829	<!--
830	"Runes" are integers that represent Unicode code points. They're not common in
831	YSH code, but can make certain string algorithms more readable.
832
833	# Pound rune literals are similar to ord('A')
834	const a = #'A'
835
836	# Backslash rune literals can appear outside of quotes
837	const newline = \n # Remember this is an integer
838	const backslash = \\ # ditto
839
840	# Unicode rune literal is syntactic sugar for 0x3bc
841	const mu = \u{3bc}
842
843	echo "chars $a $newline $backslash $mu" # => chars 65 10 92 956
844	-->
845
846	#### Float
847
848	Floats are written like you'd expect:
849
850	var small = 1.5e-10
851	var big = 3.14
852
853	#### Str
854
855	See the section above called Three Kinds of String Literals. It described
856	`'single quoted'`, `"double ${quoted}"`, and `u'J8-style\n'` strings; as well
857	as their multiline variants.
858
859	Strings are UTF-8 encoded in memory, like strings in the [Go
860	language](https://golang.org). There isn't a separate string and unicode type,
861	as in Python.
862
863	Strings are immutable, as in Python and JavaScript. This means they only
864	have transforming methods:
865
866	var x = s => trim()
867
868	Other methods:
869
870	- `trimLeft() trimRight()`
871	- `trimPrefix() trimSuffix()`
872	- `upper() lower()` (not implemented)
873
874	<!--
875	The syntax `:symbol` could be an interned string.
876	-->
877
878	#### List (and Arrays)
879
880	All lists can be expressed with Python-like literals:
881
882	var foods = ['ale', 'bean', 'corn']
883	var recursive = [1, [2, 3]]
884
885	As a special case, list of strings are called arrays. It's often more
886	convenient to write them with shell-like literals:
887
888	# No quotes or commas
889	var foods = :\| ale bean corn \|
890
891	# You can use the word language here
892	var other = :\| foo $s *.py {alice,bob}@example.com \|
893
894	Lists are mutable, as in Python and JavaScript. So they mainly have
895	mutating methods:
896
897	call foods->reverse()
898	write -- @foods
899	# =>
900	# corn
901	# bean
902	# ale
903
904	#### Dict
905
906	Dicts use syntax that's more like JavaScript than Python. Here's a dict
907	literal:
908
909	var d = {
910	name: 'bob', # unquoted keys are allowed
911	age: 42,
912	'key with spaces': 'val'
913	}
914
915	There are two syntaxes for key lookup. If the key doesn't exist, it's a fatal
916	error.
917
918	var v1 = d['name']
919	var v2 = d.name # shorthand for the above
920	var v3 = d['key with spaces'] # no shorthand for this
921
922	Keys names can be computed with expressions in `[]`:
923
924	var key = 'alice'
925	var d2 = {[key ++ '_z']: 'ZZZ'} # Computed key name
926	echo $[d2.alice_z] # => ZZZ # Reminder: expression sub
927
928	Omitting the value causes it to be taken from a variable of the same name:
929
930	var d3 = {key} # value is taken from the environment
931	echo "name is $[d3.key]" # => name is alice
932
933	More:
934
935	var empty = {}
936	echo $[len(empty)] # => 0
937
938	Dicts are mutable, as in Python and JavaScript. But the `keys()` and `values()`
939	methods return new `List` objects:
940
941	var keys = d2 => keys() # => alice_z
942	# var vals = d3 => values() # => alice
943
944	### `Place` type / "out params"
945
946	The `read` builtin can either set an implicit variable `_reply`:
947
948	whoami \| read --all # sets _reply
949
950	Or you can pass a `value.Place`, created with `&`
951
952	var x # implicitly initialized to null
953	whoami \| read --all (&x) # mutate this "place"
954	echo who=$x # => who=andy
955
956	#### Quotation Types: value.Command (Block) and value.Expr
957
958	These types are for reflection on YSH code. Most YSH programs won't use them
959	directly.
960
961	- `Command`: an unevaluated code block.
962	- rarely-used literal: `^(ls \| wc -l)`
963	- `Expr`: an unevaluated expression.
964	- rarely-used literal: `^[42 + a[i]]`
965
966	<!-- TODO: implement Block, Expr, ArgList types (variants of value) -->
967
968	### Operators
969
970	Operators are generally the same as in Python:
971
972	if (10 <= num_beans and num_beans < 20) {
973	echo 'enough'
974	} # => enough
975
976	YSH has a few operators that aren't in Python. Equality can be approximate or
977	exact:
978
979	var n = ' 42 '
980	if (n ~== 42) {
981	echo 'equal after stripping whitespace and type conversion'
982	} # => equal after stripping whitespace type conversion
983
984	if (n === 42) {
985	echo "not reached because strings and ints aren't equal"
986	}
987
988	<!-- TODO: is n === 42 a type error? -->
989
990	Pattern matching can be done with globs (`~~` and `!~~`)
991
992	const filename = 'foo.py'
993	if (filename ~~ '*.py') {
994	echo 'Python'
995	} # => Python
996
997	if (filename !~~ '*.sh') {
998	echo 'not shell'
999	} # => not shell
1000
1001	or regular expressions (`~` and `!~`). See the Eggex section below for an
1002	example of the latter.
1003
1004	Concatenation is `++` rather than `+` because it avoids confusion in the
1005	presence of type conversion:
1006
1007	var n = 42 + 1 # string plus int does implicit conversion
1008	echo $n # => 43
1009
1010	var y = 'ale ' ++ "bean $n" # concatenation
1011	echo $y # => ale bean 43
1012
1013	<!--
1014	TODO: change example above
1015	var n = '42' + 1 # string plus int does implicit conversion
1016	-->
1017
1018	<!--
1019
1020	#### Summary of Operators
1021
1022	- Arithmetic: `+ - * / // %` and `**` for exponentatiation
1023	- `/` always yields a float, and `//` is integer division
1024	- Bitwise: `& \| ^ ~`
1025	- Logical: `and or not`
1026	- Comparison: `== < > <= >= in 'not in'`
1027	- Approximate equality: `~==`
1028	- Eggex and glob match: `~ !~ ~~ !~~`
1029	- Ternary: `1 if x else 0`
1030	- Index and slice: `mylist[3]` and `mylist[1:3]`
1031	- `mydict->key` is a shortcut for `mydict['key']`
1032	- Function calls
1033	- free: `f(x, y)`
1034	- transformations and chaining: `s => startWith('prefix')`
1035	- mutating methods: `mylist->pop()`
1036	- String and List: `++` for concatenation
1037	- This is a separate operator because the addition operator `+` does
1038	string-to-int conversion
1039
1040	TODO: What about list comprehensions?
1041	-->
1042
1043	### Egg Expressions (YSH Regexes)
1044
1045	An Eggex is a type of YSH expression that denote regular expressions. They
1046	translate to POSIX ERE syntax, for use with tools like `egrep`, `awk`, and `sed
1047	--regexp-extended` (GNU only).
1048
1049	They're designed to be readable and composable. Example:
1050
1051	var D = / digit{1,3} /
1052	var ip_pattern = / D '.' D '.' D '.' D'.' /
1053
1054	var z = '192.168.0.1'
1055	if (z ~ ip_pattern) { # Use the ~ operator to match
1056	echo "$z looks like an IP address"
1057	} # => 192.168.0.1 looks like an IP address
1058
1059	if (z !~ / '.255' %end /) {
1060	echo "doesn't end with .255"
1061	} # => doesn't end with .255"
1062
1063	See the [Egg Expressions doc](eggex.html) for details.
1064
1065	## Interlude
1066
1067	Let's review what we've seen before moving onto other YSH features.
1068
1069	### Three Interleaved Languages
1070
1071	Here are the languages we saw in the last 3 sections:
1072
1073	1. Words evaluate to a string, or list of strings. This includes:
1074	- literals like `'mystr'`
1075	- substitutions like `${x}` and `$(hostname)`
1076	- globs like `*.sh`
1077	2. Commands are used for
1078	- I/O: pipelines, builtins like `read`
1079	- control flow: `if`, `for`
1080	- abstraction: `proc`
1081	3. Expressions on typed data are borrowed from Python, with some JavaScript
1082	influence.
1083	- Lists: `['ale', 'bean']` or `:\| ale bean \|`
1084	- Dicts: `{name: 'bob', age: 42}`
1085	- Functions: `split('ale bean')` and `join(['pea', 'nut'])`
1086
1087	### How Do They Work Together?
1088
1089	Here are two examples:
1090
1091	(1) In this this command, there are four words. The fourth word is an
1092	expression sub `$[]`.
1093
1094	write hello $name $[d['age'] + 1]
1095	# =>
1096	# hello
1097	# world
1098	# 43
1099
1100	(2) In this assignment, the expression on the right hand side of `=`
1101	concatenates two strings. The first string is a literal, and the second is a
1102	command sub.
1103
1104	var food = 'ale ' ++ $(echo bean \| tr a-z A-Z)
1105	write $food # => ale BEAN
1106
1107	So words, commands, and expressions are mutually recursive. If you're a
1108	conceptual person, skimming [Syntactic Concepts](syntactic-concepts.html) may
1109	help you understand this on a deeper level.
1110
1111	<!--
1112	One way to think about these sublanguages is to note that the `\|` character
1113	means something different in each context:
1114
1115	- In the command language, it's the pipeline operator, as in `ls \| wc -l`
1116	- In the word language, it's only valid in a literal string like `'\|'`, `"\|"`,
1117	or `\\|`. (It's also used in `${x\|html}`, which formats a string.)
1118	- In the expression language, it's the bitwise OR operator, as in Python and
1119	JavaScript.
1120	-->
1121
1122	## Languages for Data (Interchange Formats)
1123
1124	In addition to languages for code, YSH also deals with languages for
1125	data. [JSON]($xref) is a prominent example of the latter.
1126
1127	<!-- TODO: Link to slogans, fallacies, and concepts -->
1128
1129	### UTF-8
1130
1131	UTF-8 is the foundation of our textual data languages.
1132
1133	<!-- TODO: there's a runes() iterator which gives integer offsets, usable for
1134	slicing -->
1135
1136	<!-- TODO: write about J8 notation -->
1137
1138	### Lines of Text (traditional), and JSON/J8 Strings
1139
1140	Traditional Unix tools like `grep` and `awk` operate on streams of lines. YSH
1141	supports this style, just like any other shell.
1142
1143	But YSH also has [J8 Notation][], a data format based on [JSON][].
1144
1145	[J8 Notation]: j8-notation.html
1146
1147	It lets you encode arbitrary byte strings into a single (readable) line,
1148	including those with newlines and terminal escape sequences.
1149
1150	Example:
1151
1152	# A line with a tab char in the middle
1153	var mystr = u'pea\t' ++ u'42\n'
1154
1155	# Print it as JSON
1156	write $[toJson(mystr)] # => "pea\t42\n"
1157
1158	# JSON8 is the same, but it's not lossy for binary data
1159	write $[toJson8(mystr)] # => "pea\t42\n"
1160
1161	### Structured: JSON8, TSV8
1162
1163	You can write and read tree-shaped as [JSON][]:
1164
1165	var d = {key: 'value'}
1166	json write (d) # dump variable d as JSON
1167	# =>
1168	# {
1169	# "key": "value"
1170	# }
1171
1172	echo '["ale", 42]' > example.json
1173
1174	json read (&d2) < example.json # parse JSON into var d2
1175	pp cell d2 # inspect the in-memory value
1176	# =>
1177	# ['ale', 42]
1178
1179	[JSON][] will lose information when strings have binary data, but the slight
1180	[JSON8]($xref) upgrade won't:
1181
1182	var b = {binary: $'\xff'}
1183	json8 write (b)
1184	# =>
1185	# {
1186	# "binary": b'\yff'
1187	# }
1188
1189	[JSON]: $xref
1190
1191	<!--
1192	TODO:
1193	- Fix pp cell output
1194	- Use json write (d) syntax
1195	-->
1196
1197	Table-shaped data can be read and written as [TSV8]($xref). (TODO: not yet
1198	implemented.)
1199
1200	<!-- Figure out the API. Does it work like JSON?
1201
1202	Or I think we just implement
1203	- rows: 'where' or 'filter' (dplyr)
1204	- cols: 'select' conflicts with shell builtin; call it 'cols'?
1205	- sort: 'sort-by' or 'arrange' (dplyr)
1206	- TSV8 <=> sqlite conversion. Are these drivers or what?
1207	- and then let you pipe output?
1208
1209	Do we also need TSV8 space2tab or something? For writing TSV8 inline.
1210
1211	More later:
1212	- MessagePack (e.g. for shared library extension modules)
1213	- msgpack read, write? I think user-defined function could be like this?
1214	- SASH: Simple and Strict HTML? For easy processing
1215	-->
1216
1217	## The Runtime Shared by OSH and YSH
1218
1219	Although we describe OSH and YSH as different languages, they use the same
1220	interpreter under the hood. This interpreter has various `shopt` flags that
1221	are flipped for different behavior, e.g. with `shopt --set ysh:all`.
1222
1223	Understanding this interpreter and its interface to the Unix kernel will help
1224	you understand both languages!
1225
1226	### Interpreter Data Model
1227
1228	The [Interpreter State](interpreter-state.html) doc is under construction.
1229	It will cover:
1230
1231	- Two separate namespaces (like Lisp 1 vs. 2):
1232	- proc namespace for procs as the first word
1233	- variable namespace
1234	- The variable namespace has a call stack, for the local variables of a
1235	proc.
1236	- Each stack frame is a `{name -> cell}` mapping.
1237	- A cell has one of the above data types: `Bool`, `Int`, `Str`, etc.
1238	- A cell has `readonly`, `export`, and `nameref` flags.
1239	- Boolean shell options with `shopt`: `parse_paren`, `simple_word_eval`, etc.
1240	- String shell options with `shvar`: `IFS`, `PATH`
1241	- Registers that are silently modified by the interpreter
1242	- `$?` and `_status`
1243	- `$!` for the last PID
1244	- `_this_dir`
1245	- `_reply`
1246
1247	### Process Model (the kernel)
1248
1249	The [Process Model](process-model.html) doc is under construction. It will cover:
1250
1251	- Simple Commands, `exec`
1252	- Pipelines. #[shell-the-good-parts](#blog-tag)
1253	- `fork`, `forkwait`
1254	- Command and process substitution.
1255	- Related links:
1256	- [Tracing execution in Oils](xtrace.html) (xtrace), which divides
1257	process-based concurrency into synchronous and async constructs.
1258	- [Three Comics For Understanding Unix
1259	Shell](http://www.oilshell.org/blog/2020/04/comics.html) (blog)
1260
1261
1262	<!--
1263	Process model additions: Capers, Headless shell
1264
1265	some optimizations: See YSH starts fewer processes than other shells.
1266	-->
1267
1268	## Summary
1269
1270	YSH is a large language that evolved from Unix shell. It has shell-like
1271	commands, Python-like expressions on typed data, and Ruby-like command blocks.
1272
1273	Even though it's large, you can "forget" the bad parts of shell like `[ $x -lt
1274	$y ]`.
1275
1276	These concepts are central to YSH:
1277
1278	1. Interleaved word, command, and expression languages.
1279	2. A standard library of shell builtins, as well as builtin functions
1280	3. Languages for data: J8 Notation, including JSON8 and TSV8
1281	4. A runtime shared by OSH and YSH
1282
1283	## Related Docs
1284
1285	- [YSH vs. Shell Idioms](idioms.html) - YSH side-by-side with shell.
1286	- [YSH Language Influences](language-influences.html) - In addition to shell,
1287	Python, and JavaScript, YSH is influenced by Ruby, Perl, Awk, PHP, and more.
1288	- [A Feel For YSH Syntax](syntax-feelings.html) - Some thoughts that may help
1289	you remember the syntax.
1290	- [YSH Language Warts](warts.html) documents syntax that may be surprising.
1291
1292	## Appendix: Features Not Shown
1293
1294	### Advanced
1295
1296	These shell features are part of YSH, but aren't shown for brevity.
1297
1298	- The `fork` and `forkwait` builtins, for concurrent execution and subshells.
1299	- Process Substitution: `diff <(sort left.txt) <(sort right.txt)`
1300
1301	### Deprecated Shell Constructs
1302
1303	The shared interpreter supports many shell constructs that are deprecated:
1304
1305	- YSH code uses shell's `\|\|` and `&&` in limited circumstances, since `errexit`
1306	is on by default.
1307	- Assignment builtins like `local` and `declare`. Use YSH keywords.
1308	- Boolean expressions like `[[ x =~ $pat ]]`. Use YSH expressions.
1309	- Shell arithmetic like `$(( x + 1 ))` and `(( y = x ))`. Use YSH expressions.
1310	- The `until` loop can always be replaced with a `while` loop
1311	- Most of what's in `${}` can be written in other ways. For example
1312	`${s#/tmp}` could be `s => removePrefix('/tmp')` (TODO).
1313
1314	### Not Yet Implemented
1315
1316	This document mentions a few constructs that aren't yet implemented. Here's a
1317	summary:
1318
1319	```none
1320	# Unimplemented syntax:
1321
1322	echo ${x\|html} # formatters
1323
1324	echo ${x %.2f} # statically-parsed printf
1325
1326	var x = j"line\n"
1327	echo j"line\n" # JSON-style string literal
1328
1329	var x = "<p>$x</p>"html
1330	echo "<p>$x</p>"html # tagged string
1331
1332	var x = 15 Mi # units suffix
1333	```
1334
1335	Important builtins that aren't implemented:
1336
1337	- `describe` for testing
1338	- `parseArgs()` to parse flags
1339	- Builtins for [TSV8]($xref) - selection, projection, sorting
1340
1341	<!--
1342
1343	- To document: Method calls
1344	- To implement: Capers: stateless coprocesses
1345	-->
1346
1347	## Appendix: Example of an YSH Module
1348
1349	YSH can be used to write simple "shell scripts" or longer programs. It has
1350	procs and modules to help with the latter.
1351
1352	A module is just a file, like this:
1353
1354	```
1355	#!/usr/bin/env ysh
1356	### Deploy script
1357
1358	module main \|\| return 0 # declaration, "include guard"
1359	use bin cp mkdir # optionally declare binaries used
1360
1361	source $_this_dir/lib/util.ysh # defines 'log' helper
1362
1363	const DEST = '/tmp/ysh-tour'
1364
1365	proc my-sync(...files) {
1366	### Sync files and show which ones
1367
1368	cp --verbose @files $DEST
1369	}
1370
1371	proc main {
1372	mkdir -p $DEST
1373
1374	touch {foo,bar}.py {build,test}.sh
1375
1376	log "Copying source files"
1377	my-sync .py .sh
1378
1379	if test --dir /tmp/logs {
1380	cd /tmp/logs
1381
1382	log "Copying logs"
1383	my-sync *.log
1384	}
1385	}
1386
1387	if is-main { # The only top-level statement
1388	main @ARGV
1389	}
1390	```
1391
1392	<!--
1393	TODO:
1394	- Also show flags parsing?
1395	- Show longer examples where it isn't boilerplate
1396	-->
1397
1398	You wouldn't bother with the boilerplate for something this small. But this
1399	example illustrates the idea, which is that the top level often contains these
1400	words: `proc`, `const`, `module`, `source`, and `use`.
1401