| 1 | ---
|
| 2 | default_highlighter: oils-sh
|
| 3 | ---
|
| 4 |
|
| 5 | Syntactic Concepts in YSH
|
| 6 | =========================
|
| 7 |
|
| 8 | These documents introduce the YSH language:
|
| 9 |
|
| 10 | - [The Simplest Explanation of Oil](//www.oilshell.org/blog/2020/01/simplest-explanation.html) (blog)
|
| 11 | - [A Feel For YSH Syntax](syntax-feelings.html)
|
| 12 |
|
| 13 | In contrast, the concepts introduced below may help advanced users **remember**
|
| 14 | YSH and its syntax. Read on to learn about:
|
| 15 |
|
| 16 | - **Command vs. Expression Mode**. Command mode is like shell, and expression
|
| 17 | mode is like Python.
|
| 18 | - **Lexer Modes** help parse different "sublanguages" or dialects.
|
| 19 | - **Sigils and Sigil Pairs**. A style of syntax that's familiar to shell and
|
| 20 | Perl users.
|
| 21 | - **Parse Options** like `shopt -s parse_paren`. To selectively break
|
| 22 | compatibility, and gradually upgrade shell to YSH.
|
| 23 | - **Static Parsing**, as opposed to the dynamic parsing of shell. Static
|
| 24 | parsing improves error messages and makes more software engineering tools
|
| 25 | possible.
|
| 26 |
|
| 27 | <!-- TODO: We should talk about word lists: commands, array literals, and for
|
| 28 | loops -->
|
| 29 |
|
| 30 | <div id="toc">
|
| 31 | </div>
|
| 32 |
|
| 33 | ## Command vs. Expression Mode
|
| 34 |
|
| 35 | The YSH parser starts out in command mode:
|
| 36 |
|
| 37 | echo "hello $name"
|
| 38 |
|
| 39 | for i in 1 2 3 {
|
| 40 | echo $i
|
| 41 | }
|
| 42 |
|
| 43 | But it switches to expression mode in a few places:
|
| 44 |
|
| 45 | var x = 42 + a[i] # the RHS of = is a YSH expression
|
| 46 |
|
| 47 | echo $[mydict['key']] # interpolated expressions with $[]
|
| 48 |
|
| 49 | json write ({key: "val"}) # typed args inside ()
|
| 50 |
|
| 51 | See [Command vs. Expression Mode](command-vs-expression-mode.html) for details.
|
| 52 |
|
| 53 | ## Lexer Modes
|
| 54 |
|
| 55 | *Lexer modes* are a technique that YSH uses to manage the complex syntax of
|
| 56 | shell, which evolved over many decades.
|
| 57 |
|
| 58 | For example, `:` means something different in each of these lines:
|
| 59 |
|
| 60 | PATH=/bin:/usr/bin # Literal string
|
| 61 | echo ${x:-default} # Part of an operator
|
| 62 | echo $(( x > y ? 42 : 0 )) # Arithmetic Operator
|
| 63 | var myslice = a[3:5] # YSH expression
|
| 64 |
|
| 65 | To solve this problem, YSH has a lexer that can run in many **modes**.
|
| 66 | Multiple parsers read from this single lexer, but they demand different tokens,
|
| 67 | depending on the parsing context.
|
| 68 |
|
| 69 | ### More Information
|
| 70 |
|
| 71 | - [How OSH Uses Lexer Modes](//www.oilshell.org/blog/2016/10/19.html)
|
| 72 | - [When Are Lexer Modes Useful?](//www.oilshell.org/blog/2017/12/17.html)
|
| 73 | - [How to Parse Shell Like a Programming Language](//www.oilshell.org/blog/2019/02/07.html)
|
| 74 | - See the list of 14 lexer modes.
|
| 75 | - [Posts tagged #lexing]($blog-tag:lexing)
|
| 76 |
|
| 77 | ## Sigils and Sigil Pairs
|
| 78 |
|
| 79 | A **sigil** is a symbol like the `$` in `$mystr`.
|
| 80 |
|
| 81 | A **sigil pair** is a sigil with opening and closing delimiters, like `${var}`
|
| 82 | and `@(seq 3)`.
|
| 83 |
|
| 84 | An appendix of [A Feel For YSH Syntax](syntax-feelings.html) lists the sigil
|
| 85 | pairs in the YSH language.
|
| 86 |
|
| 87 | ### Valid Contexts
|
| 88 |
|
| 89 | Each sigil pair may be available in command mode, expression mode, or both.
|
| 90 |
|
| 91 | For example, command substitution is available in both:
|
| 92 |
|
| 93 | echo $(hostname) # command mode
|
| 94 | var x = $(hostname) # expression mode
|
| 95 |
|
| 96 | So are raw and C-style string literals:
|
| 97 |
|
| 98 | echo $'foo\n' # the bash-compatible way to do it
|
| 99 | var s = $'foo\n'
|
| 100 |
|
| 101 | echo r'c:\Program Files\'
|
| 102 | var raw = r'c:\Program Files\'
|
| 103 |
|
| 104 | But array literals only make sense in expression mode:
|
| 105 |
|
| 106 | var myarray = :| one two three |
|
| 107 |
|
| 108 | echo one two three # no array literal needed
|
| 109 |
|
| 110 | A sigil pair often changes the **lexer mode** to parse what's inside.
|
| 111 |
|
| 112 | ## Parse Options to Take Over `()`, `[]`, `@`, and `=`
|
| 113 |
|
| 114 | Most users don't have to worry about parse options. Instead, they run either
|
| 115 | `bin/osh` or `bin/ysh`, which are actually aliases for the same binary. The
|
| 116 | difference is that `bin/ysh` has the **option group** `ysh:all` on by default.
|
| 117 |
|
| 118 | Nonetheless, here are two examples.
|
| 119 |
|
| 120 | The `parse_at` option (in group `ysh:upgrade`) turns `@` into the **splice
|
| 121 | operator** when it's at the front of a word:
|
| 122 |
|
| 123 | ```sh-prompt
|
| 124 | $ var myarray = :| one two three |
|
| 125 |
|
| 126 | $ echo @myarray # @ isn't an an operator in shell
|
| 127 | @myarray
|
| 128 |
|
| 129 | $ shopt -s parse_at # parse the @ symbol
|
| 130 | $ echo @myarray
|
| 131 | one two three
|
| 132 |
|
| 133 | $ echo '@myarray' # quote it to get the old behavior
|
| 134 | @myarray
|
| 135 | ```
|
| 136 |
|
| 137 | The `parse_bracket` option (also in group `ysh:upgrade`) lets you pass
|
| 138 | unevaluated expressions to a command with `[]`:
|
| 139 |
|
| 140 | ```sh-prompt
|
| 141 | assert (^[42 === x]) # assert is passed an expression, not value
|
| 142 | assert [42 === x] # syntax sugar with parse_bracket
|
| 143 | ```
|
| 144 |
|
| 145 | ## Static Parsing
|
| 146 |
|
| 147 | POSIX specifies that Unix shell has multiple stages of parsing and evaluation.
|
| 148 | For example:
|
| 149 |
|
| 150 | ```sh-prompt
|
| 151 | $ x=2
|
| 152 | $ code='3 * x'
|
| 153 | $ echo $(( code )) # Silent eval of a string. Dangerous!
|
| 154 | 6
|
| 155 | ```
|
| 156 |
|
| 157 | YSH expressions are parsed in a single stage, and then evaluated, which makes
|
| 158 | it more like Python or JavaScript:
|
| 159 |
|
| 160 | ```sh-prompt
|
| 161 | $ setvar code = '3 * x'
|
| 162 | $ echo $[ code ]
|
| 163 | 3 * x
|
| 164 | ```
|
| 165 |
|
| 166 | Another example: shell assignment builtins like `readonly` and `local`
|
| 167 | dynamically parsed, while YSH assignment like `const` and `var` are statically
|
| 168 | parsed.
|
| 169 |
|
| 170 | ### Aside: Duplicate Functionality in Bash
|
| 171 |
|
| 172 | It's confusing that [bash]($xref) has **both** statically- and
|
| 173 | dynamically-parsed variants of the same functionality.
|
| 174 |
|
| 175 | Boolean expressions:
|
| 176 |
|
| 177 | - `[ -d /tmp ]` is dynamically parsed
|
| 178 | - `[[ -d /tmp ]]` is statically parsed
|
| 179 |
|
| 180 | C-style string literals:
|
| 181 |
|
| 182 | - `echo -e '\n'` is dynamically parsed
|
| 183 | - `echo $'\n'` is statically parsed
|
| 184 |
|
| 185 | <!--
|
| 186 | Remaining dynamic parsing in shell:
|
| 187 |
|
| 188 | - printf: `%.3f`
|
| 189 | - glob: `*.py'`
|
| 190 | - history lexer does another pass ...
|
| 191 | -->
|
| 192 |
|
| 193 | ### Related Links
|
| 194 |
|
| 195 | - [Parsing Bash is Undecidable](//www.oilshell.org/blog/2016/10/20.html)
|
| 196 | - [A 30-year-old Security Problem](//www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
|
| 197 | - [Comment on Perl and the rc shell](https://lobste.rs/s/7bpgbl/rc_plan_9_shell#c_mokqrn)
|
| 198 |
|
| 199 | ## Related Documents
|
| 200 |
|
| 201 | - [YSH Language Influences](language-influences.html). Where the syntax in YSH
|
| 202 | comes from.
|
| 203 |
|
| 204 | ## Appendix: Hand-Written vs. Generated Parsers
|
| 205 |
|
| 206 | The [OSH]($xref) language is parsed "by hand", while the [YSH]($xref)
|
| 207 | expression language is parsed with tables generated from a grammar (a modified
|
| 208 | version of [Python's pgen]($xref:pgen2)).
|
| 209 |
|
| 210 | This is mostly an implementation detail, but users may notice that OSH gives
|
| 211 | more specific error messages!
|
| 212 |
|
| 213 | Hand-written parsers give you more control over errors. Eventually the YSH
|
| 214 | language may have a hand-written parser as well. Either way, feel free to file
|
| 215 | bugs about error messages that confuse you.
|
| 216 |
|