1 | Simple Word Evaluation in Unix Shell
|
2 | ====================================
|
3 |
|
4 | This document describes the YSH word evaluation semantics (`shopt -s
|
5 | simple_word_eval`) for experienced shell users. It may also be useful to
|
6 | those who want to implement this behavior in another shell.
|
7 |
|
8 | The main idea is that YSH behaves like a traditional programming language:
|
9 |
|
10 | 1. It's **parsed** from start to end [in a single pass][parsing-shell].
|
11 | 2. It's **evaluated** in a single step too.
|
12 |
|
13 | That is, parsing and evaluation aren't interleaved, and code and data aren't
|
14 | confused.
|
15 |
|
16 | [parsing-shell]: https://www.oilshell.org/blog/2019/02/07.html
|
17 |
|
18 | [posix-spec]: https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06
|
19 |
|
20 |
|
21 | <div id="toc">
|
22 | </div>
|
23 |
|
24 | ## An Analogy: Word Expressions Should Be Like Arithmetic Expressions
|
25 |
|
26 | In YSH, "word expressions" like
|
27 |
|
28 | $x
|
29 | "hello $name"
|
30 | $(hostname)
|
31 | 'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"
|
32 |
|
33 | are parsed and evaluated in a straightforward way, like this expression when `x
|
34 | == 2`:
|
35 |
|
36 | ```sh-prompt
|
37 | 1 + x / 2 + x * 3 → 8 # Python, JS, Ruby, etc. work this way
|
38 | ```
|
39 |
|
40 | In contrast, in shell, words are "expanded" in multiple stages, like this:
|
41 |
|
42 | ```sh-prompt
|
43 | 1 + "x / 2 + \"x * 3\"" → 8 # Hypothetical, confusing language
|
44 | ```
|
45 |
|
46 | That is, it would be odd if Python looked *inside a program's strings* for
|
47 | expressions to evaluate, but that's exactly what shell does! There are
|
48 | multiple places where there's a silent `eval`, and you need **quoting** to
|
49 | inhibit it. Neglecting this can cause security problems due to confusing code
|
50 | and data (links below).
|
51 |
|
52 | In other words, the **defaults are wrong**. Programmers are surprised by shell's
|
53 | behavior, and it leads to incorrect programs.
|
54 |
|
55 | So in YSH, you can opt out of the multiple "word expansion" stages described in
|
56 | the [POSIX shell spec][posix-spec]. Instead, there's only **one stage**:
|
57 | evaluation.
|
58 |
|
59 | ## Design Goals
|
60 |
|
61 | The new semantics should be easily adoptable by existing shell scripts.
|
62 |
|
63 | - Importantly, `bin/osh` is POSIX-compatible and runs real [bash]($xref)
|
64 | scripts. You can gradually opt into **stricter and saner** behavior with
|
65 | `shopt` options (or by running `bin/ysh`). The most important one is
|
66 | [simple_word_eval]($help), and the others are listed below.
|
67 | - Even after opting in, the new syntax shouldn't break many scripts. If it
|
68 | does break, the change to fix it should be small. For example, `echo @foo`
|
69 | is not too common, and it can be made bash-compatible by quoting it: `echo
|
70 | '@foo'`.
|
71 |
|
72 | <!--
|
73 | It's technically incompatible but I think it will break very few scripts.
|
74 |
|
75 | -->
|
76 |
|
77 | ## Examples
|
78 |
|
79 | In the following examples, the [argv][] command prints the `argv` array it
|
80 | receives in a readable format:
|
81 |
|
82 | ```sh-prompt
|
83 | $ argv one "two three"
|
84 | ['one', 'two three']
|
85 | ```
|
86 |
|
87 | I also use the YSH [var]($help) keyword for assignments. *(TODO: This could be
|
88 | rewritten with shell assignment for the benefit of shell implementers)*
|
89 |
|
90 | [argv]: $oils-src:spec/bin/argv.py
|
91 |
|
92 | ### No Implicit Splitting, Dynamic Globbing, or Empty Elision
|
93 |
|
94 | In YSH, the following constructs always evaluate to **one argument**:
|
95 |
|
96 | - Variable / "parameter" substitution: `$x`, `${y}`
|
97 | - Command sub: `$(echo hi)` or backticks
|
98 | - Arithmetic sub: `$(( 1 + 2 ))`
|
99 |
|
100 |
|
101 | <!--
|
102 | Related help topics: [command-sub]($help), [var-sub]($help), [arith-sub]($help).
|
103 | Not shown: [tilde-sub]($help).
|
104 | -->
|
105 |
|
106 | That is, quotes aren't necessary to avoid:
|
107 |
|
108 | - **Word Splitting**, which uses `$IFS`.
|
109 | - **Empty Elision**. For example, `x=''; ls $x` passes `ls` no arguments.
|
110 | - **Dynamic Globbing**. Globs are *dynamic* when the pattern comes from
|
111 | program data rather than the source code.
|
112 |
|
113 | <!-- - Tilde Sub: `~bob/src` -->
|
114 |
|
115 | Here's an example showing that each construct evaluates to one arg in YSH:
|
116 |
|
117 | ```sh-prompt
|
118 | ysh$ var pic = 'my pic.jpg' # filename with spaces
|
119 | ysh$ var empty = ''
|
120 | ysh$ var pat = '*.py' # pattern stored in a string
|
121 |
|
122 | ysh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
|
123 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
|
124 | ```
|
125 |
|
126 | In contrast, shell applies splitting, globbing, and empty elision after the
|
127 | substitutions. Each of these operations returns an indeterminate number of
|
128 | strings:
|
129 |
|
130 | ```sh-prompt
|
131 | sh$ pic='my pic.jpg' # filename with spaces
|
132 | sh$ empty=
|
133 | sh$ pat='*.py' # pattern stored in a string
|
134 |
|
135 | sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
|
136 | ['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']
|
137 | ```
|
138 |
|
139 | To get the desired behavior, you have to use double quotes:
|
140 |
|
141 | ```sh-prompt
|
142 | sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
|
143 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
|
144 | ```
|
145 |
|
146 | ### Splicing, Static Globbing, and Brace Expansion
|
147 |
|
148 | The constructs in the last section evaluate to a **single argument**. In
|
149 | contrast, these three constructs evaluate to **0 to N arguments**:
|
150 |
|
151 | 1. **Splicing** an array: `"$@"` and `"${myarray[@]}"`
|
152 | 2. **Static Globbing**: `echo *.py`. Globs are *static* when they occur in the
|
153 | program text.
|
154 | 3. **Brace expansion**: `{alice,bob}@example.com`
|
155 |
|
156 | In YSH, `shopt -s parse_at` enables these shortcuts for splicing:
|
157 |
|
158 | - `@myarray` for `"${myarray[@]}"`
|
159 | - `@ARGV` for `"$@"`
|
160 |
|
161 | Example:
|
162 |
|
163 | ```sh-prompt
|
164 | ysh$ var myarray = :| 'a b' c | # array with 2 elements
|
165 | ysh$ set -- 'd e' f # 2 arguments
|
166 |
|
167 | ysh$ argv @myarray @ARGV *.py {ian,jack}@sh.com
|
168 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
|
169 | ```
|
170 |
|
171 | is just like:
|
172 |
|
173 |
|
174 | ```sh-prompt
|
175 | bash$ myarray=('a b' c)
|
176 | bash$ set -- 'd e' f
|
177 |
|
178 | bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
|
179 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
|
180 | ```
|
181 |
|
182 | Unchanged: quotes disable globbing and brace expansion:
|
183 |
|
184 | ```sh-prompt
|
185 | $ echo *.py
|
186 | foo.py bar.py
|
187 |
|
188 | $ echo "*.py" # globbing disabled with quotes
|
189 | *.py
|
190 |
|
191 | $ echo {spam,eggs}.sh
|
192 | spam.sh eggs.sh
|
193 |
|
194 | $ echo "{spam,eggs}.sh" # brace expansion disabled with quotes
|
195 | {spam,eggs}.sh
|
196 | ```
|
197 |
|
198 | <!--
|
199 | help topics:
|
200 |
|
201 | - braces
|
202 | - glob
|
203 | - splice
|
204 |
|
205 | More:
|
206 | - inline-call
|
207 |
|
208 | -->
|
209 |
|
210 | ## Where These Rules Apply
|
211 |
|
212 | These rules apply when a **sequence** of words is being evaluated, exactly as
|
213 | in shell:
|
214 |
|
215 | 1. [Command]($help:simple-command): `echo $x foo`
|
216 | 2. [For loop]($help:for): `for i in $x foo; do ...`
|
217 | 3. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([ysh-array]($help))
|
218 |
|
219 | Shell has other word evaluation contexts like:
|
220 |
|
221 | ```sh-prompt
|
222 | sh$ x="${not_array[@]}"
|
223 | sh$ echo hi > "${not_array[@]}"
|
224 | ```
|
225 |
|
226 | which aren't affected by [simple_word_eval]($help).
|
227 |
|
228 | <!--
|
229 | EvalWordSequence
|
230 | -->
|
231 |
|
232 | ## Opt In to the Old Behavior With Explicit Expressions
|
233 |
|
234 | YSH can express everything that shell can.
|
235 |
|
236 | - Split with `@[split(mystr, IFS?)]`
|
237 | - Glob with `@[glob(mypat)]`
|
238 | - Elision with `@[maybe(s)]`
|
239 |
|
240 | ## More Word Evaluation Issues
|
241 |
|
242 | ### More `shopt` Options
|
243 |
|
244 | - [nullglob]($help) - Globs matching nothing don't evaluate to code.
|
245 | - [dashglob]($help) is true by default, but **disabled** when YSH is enabled, so that
|
246 | files that begin with `-` aren't returned. This avoids [confusing flags and
|
247 | files](https://www.oilshell.org/blog/2020/02/dashglob.html).
|
248 |
|
249 | Strict options cause fatal errors:
|
250 |
|
251 | - [strict_tilde]($help) - Failed tilde expansions don't evaluate to code.
|
252 | - [strict_word_eval]($help) - Invalid slices and invalid UTF-8 aren't ignored.
|
253 |
|
254 | ### Arithmetic Is Statically Parsed
|
255 |
|
256 | This is an intentional incompatibility described in the [Known
|
257 | Differences](known-differences.html#static-parsing) doc.
|
258 |
|
259 | <!--
|
260 | TODO: also allow
|
261 |
|
262 | var parts = @[split(x)]
|
263 | var python = @[glob('*.py')]
|
264 | -->
|
265 |
|
266 | ## Summary
|
267 |
|
268 | YSH word evaluation is enabled with `shopt -s simple_word_eval`, and proceeds
|
269 | in a single step.
|
270 |
|
271 | Variable, command, and arithmetic substitutions predictably evaluate to a
|
272 | **single argument**, regardless of whether they're empty or have spaces.
|
273 | There's no implicit splitting, globbing, or elision of empty words.
|
274 |
|
275 | You can opt into those behaviors with explicit expressions like
|
276 | `@[split(mystr)]`, which evaluates to an array.
|
277 |
|
278 | YSH also supports shell features that evaluate to **0 to N arguments**:
|
279 | splicing, globbing, and brace expansion.
|
280 |
|
281 | There are other options that "clean up" word evaluation. All options are
|
282 | designed to be gradually adopted by other shells, shell scripts, and eventually
|
283 | POSIX.
|
284 |
|
285 | ## Notes
|
286 |
|
287 | ### Related Documents
|
288 |
|
289 | - [The Simplest Explanation of
|
290 | Oil](http://www.oilshell.org/blog/2020/01/simplest-explanation.html). Some
|
291 | color on the rest of the language.
|
292 | - [Known Differences Between OSH and Other Shells](known-differences.html).
|
293 | Mentioned above: Arithmetic is statically parsed. Arrays and strings are
|
294 | kept separate.
|
295 | - [OSH Word Evaluation Algorithm][wiki-word-eval] on the Wiki. Informally
|
296 | describes the data structures, and describes legacy constructs.
|
297 | - [Security implications of forgetting to quote a variable in bash/POSIX
|
298 | shells](https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells)
|
299 | by Stéphane Chazelas. Describes the "implicit split+glob" operator, which
|
300 | YSH word evaluation removes.
|
301 | - This is essentially the same [security
|
302 | issue](http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
|
303 | I rediscovered in January 2019. It appears in all [ksh]($xref)-derived shells, and some shells
|
304 | recently patched it. I wasn't able to exploit in a "real" context;
|
305 | otherwise I'd have made more noise about it.
|
306 | - Also described by the Fedora Security team: [Defensive Coding: Shell Double Expansion](https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Shell-Double_Expansion.html)
|
307 |
|
308 | [wiki-word-eval]: https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
|
309 |
|
310 | ### Tip: View the Syntax Tree With `-n`
|
311 |
|
312 | This gives insight into [how Oils parses shell][parsing-shell]:
|
313 |
|
314 | ```sh-prompt
|
315 | $ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
|
316 | (C {<echo>}
|
317 | {
|
318 | (braced_var_sub
|
319 | token: <Id.VSub_Name x>
|
320 | suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
|
321 | )
|
322 | (word_part.ArithSub
|
323 | anode:
|
324 | (arith_expr.Binary
|
325 | op_id: Id.Arith_Plus
|
326 | left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
|
327 | right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
|
328 | )
|
329 | )
|
330 | }
|
331 | )
|
332 | ```
|
333 |
|
334 | You can pass `--ast-format text` for more details.
|
335 |
|
336 | Evaluation of the syntax tree is a single step.
|
337 |
|
338 |
|
339 | <!--
|
340 |
|
341 | ### Elision Without @[maybe()]
|
342 |
|
343 | The `@[maybe(s)]` function is a shortcut for something like:
|
344 |
|
345 | ```
|
346 | var x = '' # empty in this case
|
347 | var tmp = :| |
|
348 | if (x) { # test if string is non-empty
|
349 | append $x (tmp) # appends 'x' to the array variable 'tmp'
|
350 | }
|
351 | ```
|
352 |
|
353 | This is how it's used:
|
354 |
|
355 | -->
|