| 1 | Simple Word Evaluation in Unix Shell
 | 
| 2 | ====================================
 | 
| 3 | 
 | 
| 4 | This document describes the YSH word evaluation semantics (`shopt -s
 | 
| 5 | simple_word_eval`) for experienced shell users.  It may also be useful to
 | 
| 6 | those who want to implement this behavior in another shell.
 | 
| 7 | 
 | 
| 8 | The main idea is that YSH behaves like a traditional programming language:
 | 
| 9 | 
 | 
| 10 | 1. It's **parsed** from start to end [in a single pass][parsing-shell].
 | 
| 11 | 2. It's **evaluated** in a single step too.
 | 
| 12 | 
 | 
| 13 | That is, parsing and evaluation aren't interleaved, and code and data aren't
 | 
| 14 | confused.
 | 
| 15 | 
 | 
| 16 | [parsing-shell]: https://www.oilshell.org/blog/2019/02/07.html
 | 
| 17 | 
 | 
| 18 | [posix-spec]: https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06
 | 
| 19 | 
 | 
| 20 | 
 | 
| 21 | <div id="toc">
 | 
| 22 | </div>
 | 
| 23 | 
 | 
| 24 | ## An Analogy: Word Expressions Should Be Like Arithmetic Expressions
 | 
| 25 | 
 | 
| 26 | In YSH, "word expressions" like
 | 
| 27 | 
 | 
| 28 |     $x
 | 
| 29 |     "hello $name"
 | 
| 30 |     $(hostname)
 | 
| 31 |     'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"
 | 
| 32 |     
 | 
| 33 | are parsed and evaluated in a straightforward way, like this expression when `x
 | 
| 34 | == 2`:
 | 
| 35 | 
 | 
| 36 | ```sh-prompt
 | 
| 37 | 1 + x / 2 + x * 3        → 8  # Python, JS, Ruby, etc. work this way
 | 
| 38 | ```
 | 
| 39 | 
 | 
| 40 | In contrast, in shell, words are "expanded" in multiple stages, like this:
 | 
| 41 | 
 | 
| 42 | ```sh-prompt
 | 
| 43 | 1 + "x / 2 + \"x * 3\""  → 8  # Hypothetical, confusing language
 | 
| 44 | ```
 | 
| 45 | 
 | 
| 46 | That is, it would be odd if Python looked *inside a program's strings* for
 | 
| 47 | expressions to evaluate, but that's exactly what shell does!  There are
 | 
| 48 | multiple places where there's a silent `eval`, and you need **quoting** to
 | 
| 49 | inhibit it.  Neglecting this can cause security problems due to confusing code
 | 
| 50 | and data (links below).
 | 
| 51 | 
 | 
| 52 | In other words, the **defaults are wrong**.  Programmers are surprised by shell's
 | 
| 53 | behavior, and it leads to incorrect programs.
 | 
| 54 | 
 | 
| 55 | So in YSH, you can opt out of the multiple "word expansion" stages described in
 | 
| 56 | the [POSIX shell spec][posix-spec].  Instead, there's only **one stage**:
 | 
| 57 | evaluation.
 | 
| 58 | 
 | 
| 59 | ## Design Goals
 | 
| 60 | 
 | 
| 61 | The new semantics should be easily adoptable by existing shell scripts.
 | 
| 62 | 
 | 
| 63 | - Importantly, `bin/osh` is POSIX-compatible and runs real [bash]($xref)
 | 
| 64 |   scripts.  You can gradually opt into **stricter and saner** behavior with
 | 
| 65 |   `shopt` options (or by running `bin/ysh`).  The most important one is
 | 
| 66 |   [simple_word_eval]($help), and the others are listed below.
 | 
| 67 | - Even after opting in, the new syntax shouldn't break many scripts.  If it
 | 
| 68 |   does break, the change to fix it should be small.  For example, `echo @foo`
 | 
| 69 |   is not too common, and it can be made bash-compatible by quoting it: `echo
 | 
| 70 |   '@foo'`.
 | 
| 71 | 
 | 
| 72 | <!--
 | 
| 73 | It's technically incompatible but I think it will break very few scripts.
 | 
| 74 | 
 | 
| 75 | -->
 | 
| 76 | 
 | 
| 77 | ## Examples
 | 
| 78 | 
 | 
| 79 | In the following examples, the [argv][] command prints the `argv` array it
 | 
| 80 | receives in a readable format:
 | 
| 81 | 
 | 
| 82 | ```sh-prompt
 | 
| 83 | $ argv one "two three"
 | 
| 84 | ['one', 'two three']
 | 
| 85 | ```
 | 
| 86 | 
 | 
| 87 | I also use the YSH [var]($help) keyword for assignments.  *(TODO: This could be
 | 
| 88 | rewritten with shell assignment for the benefit of shell implementers)*
 | 
| 89 | 
 | 
| 90 | [argv]: $oils-src:spec/bin/argv.py
 | 
| 91 | 
 | 
| 92 | ### No Implicit Splitting, Dynamic Globbing, or Empty Elision
 | 
| 93 | 
 | 
| 94 | In YSH, the following constructs always evaluate to **one argument**:
 | 
| 95 | 
 | 
| 96 | - Variable / "parameter" substitution: `$x`, `${y}`
 | 
| 97 | - Command sub: `$(echo hi)` or backticks
 | 
| 98 | - Arithmetic sub: `$(( 1 + 2 ))`
 | 
| 99 | 
 | 
| 100 | 
 | 
| 101 | <!--
 | 
| 102 | Related help topics: [command-sub]($help), [var-sub]($help), [arith-sub]($help).
 | 
| 103 | Not shown: [tilde-sub]($help).
 | 
| 104 | -->
 | 
| 105 | 
 | 
| 106 | That is, quotes aren't necessary to avoid:
 | 
| 107 | 
 | 
| 108 | - **Word Splitting**, which uses `$IFS`.
 | 
| 109 | - **Empty Elision**.  For example, `x=''; ls $x` passes `ls` no arguments.
 | 
| 110 | - **Dynamic Globbing**.  Globs are *dynamic* when the pattern comes from
 | 
| 111 |   program data rather than the source code.
 | 
| 112 | 
 | 
| 113 | <!-- - Tilde Sub: `~bob/src` -->
 | 
| 114 | 
 | 
| 115 | Here's an example showing that each construct evaluates to one arg in YSH:
 | 
| 116 | 
 | 
| 117 | ```sh-prompt
 | 
| 118 | ysh$ var pic = 'my pic.jpg'  # filename with spaces
 | 
| 119 | ysh$ var empty = ''
 | 
| 120 | ysh$ var pat = '*.py'        # pattern stored in a string
 | 
| 121 | 
 | 
| 122 | ysh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
 | 
| 123 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
 | 
| 124 | ```
 | 
| 125 | 
 | 
| 126 | In contrast, shell applies splitting, globbing, and empty elision after the
 | 
| 127 | substitutions.  Each of these operations returns an indeterminate number of
 | 
| 128 | strings:
 | 
| 129 | 
 | 
| 130 | ```sh-prompt
 | 
| 131 | sh$ pic='my pic.jpg'  # filename with spaces
 | 
| 132 | sh$ empty=
 | 
| 133 | sh$ pat='*.py'        # pattern stored in a string
 | 
| 134 | 
 | 
| 135 | sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
 | 
| 136 | ['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']
 | 
| 137 | ```
 | 
| 138 | 
 | 
| 139 | To get the desired behavior, you have to use double quotes:
 | 
| 140 | 
 | 
| 141 | ```sh-prompt
 | 
| 142 | sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
 | 
| 143 | ['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
 | 
| 144 | ```
 | 
| 145 | 
 | 
| 146 | ### Splicing, Static Globbing, and Brace Expansion
 | 
| 147 | 
 | 
| 148 | The constructs in the last section evaluate to a **single argument**.  In
 | 
| 149 | contrast, these three constructs evaluate to **0 to N arguments**:
 | 
| 150 | 
 | 
| 151 | 1. **Splicing** an array: `"$@"` and `"${myarray[@]}"`
 | 
| 152 | 2. **Static Globbing**: `echo *.py`.  Globs are *static* when they occur in the
 | 
| 153 |    program text.
 | 
| 154 | 3. **Brace expansion**: `{alice,bob}@example.com`
 | 
| 155 | 
 | 
| 156 | In YSH, `shopt -s parse_at` enables these shortcuts for splicing:
 | 
| 157 | 
 | 
| 158 | - `@myarray` for `"${myarray[@]}"`
 | 
| 159 | - `@ARGV` for `"$@"`
 | 
| 160 | 
 | 
| 161 | Example:
 | 
| 162 | 
 | 
| 163 | ```sh-prompt
 | 
| 164 | ysh$ var myarray = :| 'a b' c |  # array with 2 elements
 | 
| 165 | ysh$ set -- 'd e' f              # 2 arguments
 | 
| 166 | 
 | 
| 167 | ysh$ argv @myarray @ARGV *.py {ian,jack}@sh.com
 | 
| 168 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
 | 
| 169 | ```
 | 
| 170 | 
 | 
| 171 | is just like:
 | 
| 172 | 
 | 
| 173 | 
 | 
| 174 | ```sh-prompt
 | 
| 175 | bash$ myarray=('a b' c)
 | 
| 176 | bash$ set -- 'd e' f
 | 
| 177 | 
 | 
| 178 | bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
 | 
| 179 | ['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
 | 
| 180 | ```
 | 
| 181 | 
 | 
| 182 | Unchanged: quotes disable globbing and brace expansion:
 | 
| 183 | 
 | 
| 184 | ```sh-prompt
 | 
| 185 | $ echo *.py
 | 
| 186 | foo.py bar.py
 | 
| 187 | 
 | 
| 188 | $ echo "*.py"            # globbing disabled with quotes
 | 
| 189 | *.py
 | 
| 190 | 
 | 
| 191 | $ echo {spam,eggs}.sh
 | 
| 192 | spam.sh eggs.sh
 | 
| 193 | 
 | 
| 194 | $ echo "{spam,eggs}.sh"  # brace expansion disabled with quotes
 | 
| 195 | {spam,eggs}.sh
 | 
| 196 | ```
 | 
| 197 | 
 | 
| 198 | <!--
 | 
| 199 | help topics:
 | 
| 200 | 
 | 
| 201 | - braces
 | 
| 202 | - glob
 | 
| 203 | - splice
 | 
| 204 | 
 | 
| 205 | More:
 | 
| 206 | - inline-call
 | 
| 207 | 
 | 
| 208 | -->
 | 
| 209 | 
 | 
| 210 | ## Where These Rules Apply
 | 
| 211 | 
 | 
| 212 | These rules apply when a **sequence** of words is being evaluated, exactly as
 | 
| 213 | in shell:
 | 
| 214 | 
 | 
| 215 | 1. [Command]($help:simple-command): `echo $x foo`
 | 
| 216 | 2. [For loop]($help:for): `for i in $x foo; do ...`
 | 
| 217 | 3. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([ysh-array]($help))
 | 
| 218 | 
 | 
| 219 | Shell has other word evaluation contexts like:
 | 
| 220 | 
 | 
| 221 | ```sh-prompt
 | 
| 222 | sh$ x="${not_array[@]}"
 | 
| 223 | sh$ echo hi > "${not_array[@]}"
 | 
| 224 | ```
 | 
| 225 | 
 | 
| 226 | which aren't affected by [simple_word_eval]($help).
 | 
| 227 | 
 | 
| 228 | <!--
 | 
| 229 | EvalWordSequence
 | 
| 230 | -->
 | 
| 231 | 
 | 
| 232 | ## Opt In to the Old Behavior With Explicit Expressions
 | 
| 233 | 
 | 
| 234 | YSH can express everything that shell can.
 | 
| 235 | 
 | 
| 236 | - Split with `@[split(mystr, IFS?)]`
 | 
| 237 | - Glob with `@[glob(mypat)]`
 | 
| 238 | - Elision with `@[maybe(s)]`
 | 
| 239 | 
 | 
| 240 | ## More Word Evaluation Issues
 | 
| 241 | 
 | 
| 242 | ### More `shopt` Options
 | 
| 243 | 
 | 
| 244 | - [nullglob]($help) - Globs matching nothing don't evaluate to code.
 | 
| 245 | - [dashglob]($help) is true by default, but **disabled** when YSH is enabled, so that
 | 
| 246 |   files that begin with `-` aren't returned.  This avoids [confusing flags and
 | 
| 247 |   files](https://www.oilshell.org/blog/2020/02/dashglob.html).
 | 
| 248 | 
 | 
| 249 | Strict options cause fatal errors:
 | 
| 250 | 
 | 
| 251 | - [strict_tilde]($help) - Failed tilde expansions don't evaluate to code.
 | 
| 252 | - [strict_word_eval]($help) - Invalid slices and invalid UTF-8 aren't ignored.
 | 
| 253 | 
 | 
| 254 | ### Arithmetic Is Statically Parsed
 | 
| 255 | 
 | 
| 256 | This is an intentional incompatibility described in the [Known
 | 
| 257 | Differences](known-differences.html#static-parsing) doc.
 | 
| 258 | 
 | 
| 259 | <!--
 | 
| 260 | TODO: also allow
 | 
| 261 | 
 | 
| 262 | var parts = @[split(x)]
 | 
| 263 | var python = @[glob('*.py')]
 | 
| 264 | -->
 | 
| 265 | 
 | 
| 266 | ## Summary
 | 
| 267 | 
 | 
| 268 | YSH word evaluation is enabled with `shopt -s simple_word_eval`, and proceeds
 | 
| 269 | in a single step.
 | 
| 270 | 
 | 
| 271 | Variable, command, and arithmetic substitutions predictably evaluate to a
 | 
| 272 | **single argument**, regardless of whether they're empty or have spaces.
 | 
| 273 | There's no implicit splitting, globbing, or elision of empty words.
 | 
| 274 | 
 | 
| 275 | You can opt into those behaviors with explicit expressions like
 | 
| 276 | `@[split(mystr)]`, which evaluates to an array.
 | 
| 277 | 
 | 
| 278 | YSH also supports shell features that evaluate to **0 to N arguments**:
 | 
| 279 | splicing, globbing, and brace expansion.
 | 
| 280 | 
 | 
| 281 | There are other options that "clean up" word evaluation.  All options are
 | 
| 282 | designed to be gradually adopted by other shells, shell scripts, and eventually
 | 
| 283 | POSIX.
 | 
| 284 | 
 | 
| 285 | ## Notes
 | 
| 286 | 
 | 
| 287 | ### Related Documents
 | 
| 288 | 
 | 
| 289 | - [The Simplest Explanation of
 | 
| 290 |   Oil](http://www.oilshell.org/blog/2020/01/simplest-explanation.html).  Some
 | 
| 291 |   color on the rest of the language.
 | 
| 292 | - [Known Differences Between OSH and Other Shells](known-differences.html).
 | 
| 293 |   Mentioned above: Arithmetic is statically parsed.  Arrays and strings are
 | 
| 294 |   kept separate.
 | 
| 295 | - [OSH Word Evaluation Algorithm][wiki-word-eval] on the Wiki.  Informally
 | 
| 296 |   describes the data structures, and describes legacy constructs.
 | 
| 297 | - [Security implications of forgetting to quote a variable in bash/POSIX
 | 
| 298 |   shells](https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells)
 | 
| 299 |   by Stéphane Chazelas.  Describes the "implicit split+glob" operator, which
 | 
| 300 |   YSH word evaluation removes.
 | 
| 301 |   - This is essentially the same [security
 | 
| 302 |     issue](http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
 | 
| 303 |     I rediscovered in January 2019.  It appears in all [ksh]($xref)-derived shells, and some shells
 | 
| 304 |     recently patched it.  I wasn't able to exploit in a "real" context;
 | 
| 305 |     otherwise I'd have made more noise about it.
 | 
| 306 |   - Also described by the Fedora Security team: [Defensive Coding: Shell Double Expansion](https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Shell-Double_Expansion.html)
 | 
| 307 | 
 | 
| 308 | [wiki-word-eval]: https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
 | 
| 309 | 
 | 
| 310 | ### Tip: View the Syntax Tree With `-n`
 | 
| 311 | 
 | 
| 312 | This gives insight into [how Oils parses shell][parsing-shell]:
 | 
| 313 | 
 | 
| 314 | ```sh-prompt
 | 
| 315 | $ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
 | 
| 316 | (C {<echo>} 
 | 
| 317 |   {
 | 
| 318 |     (braced_var_sub
 | 
| 319 |       token: <Id.VSub_Name x>
 | 
| 320 |       suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
 | 
| 321 |     ) 
 | 
| 322 |     (word_part.ArithSub
 | 
| 323 |       anode: 
 | 
| 324 |         (arith_expr.Binary
 | 
| 325 |           op_id: Id.Arith_Plus
 | 
| 326 |           left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
 | 
| 327 |           right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
 | 
| 328 |         )
 | 
| 329 |     )
 | 
| 330 |   }
 | 
| 331 | )
 | 
| 332 | ```
 | 
| 333 | 
 | 
| 334 | You can pass `--ast-format text` for more details.
 | 
| 335 | 
 | 
| 336 | Evaluation of the syntax tree is a single step.
 | 
| 337 | 
 | 
| 338 | 
 | 
| 339 | <!--
 | 
| 340 | 
 | 
| 341 | ### Elision Without @[maybe()]
 | 
| 342 | 
 | 
| 343 | The `@[maybe(s)]` function is a shortcut for something like:
 | 
| 344 | 
 | 
| 345 | ```
 | 
| 346 | var x = ''          # empty in this case
 | 
| 347 | var tmp = :| |
 | 
| 348 | if (x) {            # test if string is non-empty
 | 
| 349 |   append $x (tmp)   # appends 'x' to the array variable 'tmp'
 | 
| 350 | }
 | 
| 351 | ```
 | 
| 352 | 
 | 
| 353 | This is how it's used:
 | 
| 354 | 
 | 
| 355 | -->
 |