OILS / doc / syntactic-concepts.md View on Github | oilshell.org

216 lines, 145 significant
1---
2default_highlighter: oils-sh
3---
4
5Syntactic Concepts in YSH
6=========================
7
8These documents introduce the YSH language:
9
10- [The Simplest Explanation of Oil](//www.oilshell.org/blog/2020/01/simplest-explanation.html) (blog)
11- [A Feel For YSH Syntax](syntax-feelings.html)
12
13In contrast, the concepts introduced below may help advanced users **remember**
14YSH and its syntax. Read on to learn about:
15
16- **Command vs. Expression Mode**. Command mode is like shell, and expression
17 mode is like Python.
18- **Lexer Modes** help parse different "sublanguages" or dialects.
19- **Sigils and Sigil Pairs**. A style of syntax that's familiar to shell and
20 Perl users.
21- **Parse Options** like `shopt -s parse_paren`. To selectively break
22 compatibility, and gradually upgrade shell to YSH.
23- **Static Parsing**, as opposed to the dynamic parsing of shell. Static
24 parsing improves error messages and makes more software engineering tools
25 possible.
26
27<!-- TODO: We should talk about word lists: commands, array literals, and for
28loops -->
29
30<div id="toc">
31</div>
32
33## Command vs. Expression Mode
34
35The YSH parser starts out in command mode:
36
37 echo "hello $name"
38
39 for i in 1 2 3 {
40 echo $i
41 }
42
43But it switches to expression mode in a few places:
44
45 var x = 42 + a[i] # the RHS of = is a YSH expression
46
47 echo $[mydict['key']] # interpolated expressions with $[]
48
49 json write ({key: "val"}) # typed args inside ()
50
51See [Command vs. Expression Mode](command-vs-expression-mode.html) for details.
52
53## Lexer Modes
54
55*Lexer modes* are a technique that YSH uses to manage the complex syntax of
56shell, which evolved over many decades.
57
58For example, `:` means something different in each of these lines:
59
60 PATH=/bin:/usr/bin # Literal string
61 echo ${x:-default} # Part of an operator
62 echo $(( x > y ? 42 : 0 )) # Arithmetic Operator
63 var myslice = a[3:5] # YSH expression
64
65To solve this problem, YSH has a lexer that can run in many **modes**.
66Multiple parsers read from this single lexer, but they demand different tokens,
67depending on the parsing context.
68
69### More Information
70
71- [How OSH Uses Lexer Modes](//www.oilshell.org/blog/2016/10/19.html)
72- [When Are Lexer Modes Useful?](//www.oilshell.org/blog/2017/12/17.html)
73- [How to Parse Shell Like a Programming Language](//www.oilshell.org/blog/2019/02/07.html)
74 - See the list of 14 lexer modes.
75- [Posts tagged #lexing]($blog-tag:lexing)
76
77## Sigils and Sigil Pairs
78
79A **sigil** is a symbol like the `$` in `$mystr`.
80
81A **sigil pair** is a sigil with opening and closing delimiters, like `${var}`
82and `@(seq 3)`.
83
84An appendix of [A Feel For YSH Syntax](syntax-feelings.html) lists the sigil
85pairs in the YSH language.
86
87### Valid Contexts
88
89Each sigil pair may be available in command mode, expression mode, or both.
90
91For example, command substitution is available in both:
92
93 echo $(hostname) # command mode
94 var x = $(hostname) # expression mode
95
96So are raw and C-style string literals:
97
98 echo $'foo\n' # the bash-compatible way to do it
99 var s = $'foo\n'
100
101 echo r'c:\Program Files\'
102 var raw = r'c:\Program Files\'
103
104But array literals only make sense in expression mode:
105
106 var myarray = :| one two three |
107
108 echo one two three # no array literal needed
109
110A sigil pair often changes the **lexer mode** to parse what's inside.
111
112## Parse Options to Take Over `()`, `[]`, `@`, and `=`
113
114Most users don't have to worry about parse options. Instead, they run either
115`bin/osh` or `bin/ysh`, which are actually aliases for the same binary. The
116difference is that `bin/ysh` has the **option group** `ysh:all` on by default.
117
118Nonetheless, here are two examples.
119
120The `parse_at` option (in group `ysh:upgrade`) turns `@` into the **splice
121operator** when it's at the front of a word:
122
123```sh-prompt
124$ var myarray = :| one two three |
125
126$ echo @myarray # @ isn't an an operator in shell
127@myarray
128
129$ shopt -s parse_at # parse the @ symbol
130$ echo @myarray
131one two three
132
133$ echo '@myarray' # quote it to get the old behavior
134@myarray
135```
136
137The `parse_bracket` option (also in group `ysh:upgrade`) lets you pass
138unevaluated expressions to a command with `[]`:
139
140```sh-prompt
141assert (^[42 === x]) # assert is passed an expression, not value
142assert [42 === x] # syntax sugar with parse_bracket
143```
144
145## Static Parsing
146
147POSIX specifies that Unix shell has multiple stages of parsing and evaluation.
148For example:
149
150```sh-prompt
151$ x=2
152$ code='3 * x'
153$ echo $(( code )) # Silent eval of a string. Dangerous!
1546
155```
156
157YSH expressions are parsed in a single stage, and then evaluated, which makes
158it more like Python or JavaScript:
159
160```sh-prompt
161$ setvar code = '3 * x'
162$ echo $[ code ]
1633 * x
164```
165
166Another example: shell assignment builtins like `readonly` and `local`
167dynamically parsed, while YSH assignment like `const` and `var` are statically
168parsed.
169
170### Aside: Duplicate Functionality in Bash
171
172It's confusing that [bash]($xref) has **both** statically- and
173dynamically-parsed variants of the same functionality.
174
175Boolean expressions:
176
177- `[ -d /tmp ]` is dynamically parsed
178- `[[ -d /tmp ]]` is statically parsed
179
180C-style string literals:
181
182- `echo -e '\n'` is dynamically parsed
183- `echo $'\n'` is statically parsed
184
185<!--
186Remaining dynamic parsing in shell:
187
188- printf: `%.3f`
189- glob: `*.py'`
190- history lexer does another pass ...
191-->
192
193### Related Links
194
195- [Parsing Bash is Undecidable](//www.oilshell.org/blog/2016/10/20.html)
196- [A 30-year-old Security Problem](//www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
197- [Comment on Perl and the rc shell](https://lobste.rs/s/7bpgbl/rc_plan_9_shell#c_mokqrn)
198
199## Related Documents
200
201- [YSH Language Influences](language-influences.html). Where the syntax in YSH
202 comes from.
203
204## Appendix: Hand-Written vs. Generated Parsers
205
206The [OSH]($xref) language is parsed "by hand", while the [YSH]($xref)
207expression language is parsed with tables generated from a grammar (a modified
208version of [Python's pgen]($xref:pgen2)).
209
210This is mostly an implementation detail, but users may notice that OSH gives
211more specific error messages!
212
213Hand-written parsers give you more control over errors. Eventually the YSH
214language may have a hand-written parser as well. Either way, feel free to file
215bugs about error messages that confuse you.
216