| 1 | ---
 | 
| 2 | default_highlighter: oils-sh
 | 
| 3 | ---
 | 
| 4 | 
 | 
| 5 | Syntactic Concepts in YSH
 | 
| 6 | =========================
 | 
| 7 | 
 | 
| 8 | These documents introduce the YSH language:
 | 
| 9 | 
 | 
| 10 | - [The Simplest Explanation of Oil](//www.oilshell.org/blog/2020/01/simplest-explanation.html) (blog)
 | 
| 11 | - [A Feel For YSH Syntax](syntax-feelings.html)
 | 
| 12 | 
 | 
| 13 | In contrast, the concepts introduced below may help advanced users **remember**
 | 
| 14 | YSH and its syntax.  Read on to learn about:
 | 
| 15 | 
 | 
| 16 | - **Command vs. Expression Mode**.  Command mode is like shell, and expression
 | 
| 17 |   mode is like Python.
 | 
| 18 | - **Lexer Modes** help parse different "sublanguages" or dialects.
 | 
| 19 | - **Sigils and Sigil Pairs**.  A style of syntax that's familiar to shell and
 | 
| 20 |   Perl users.
 | 
| 21 | - **Parse Options** like `shopt -s parse_paren`.  To selectively break
 | 
| 22 |   compatibility, and gradually upgrade shell to YSH.
 | 
| 23 | - **Static Parsing**, as opposed to the dynamic parsing of shell.  Static
 | 
| 24 |   parsing improves error messages and makes more software engineering tools
 | 
| 25 |   possible.
 | 
| 26 | 
 | 
| 27 | <!-- TODO: We should talk about word lists: commands, array literals, and for
 | 
| 28 | loops -->
 | 
| 29 | 
 | 
| 30 | <div id="toc">
 | 
| 31 | </div> 
 | 
| 32 | 
 | 
| 33 | ## Command vs. Expression Mode
 | 
| 34 | 
 | 
| 35 | The YSH parser starts out in command mode:
 | 
| 36 | 
 | 
| 37 |     echo "hello $name"
 | 
| 38 | 
 | 
| 39 |     for i in 1 2 3 {
 | 
| 40 |       echo $i
 | 
| 41 |     }
 | 
| 42 | 
 | 
| 43 | But it switches to expression mode in a few places:
 | 
| 44 | 
 | 
| 45 |     var x = 42 + a[i]          # the RHS of = is a YSH expression
 | 
| 46 | 
 | 
| 47 |     echo $[mydict['key']]      # interpolated expressions with $[]
 | 
| 48 | 
 | 
| 49 |     json write ({key: "val"})  # typed args inside ()
 | 
| 50 | 
 | 
| 51 | See [Command vs. Expression Mode](command-vs-expression-mode.html) for details.
 | 
| 52 | 
 | 
| 53 | ## Lexer Modes
 | 
| 54 | 
 | 
| 55 | *Lexer modes* are a technique that YSH uses to manage the complex syntax of
 | 
| 56 | shell, which evolved over many decades.
 | 
| 57 | 
 | 
| 58 | For example, `:` means something different in each of these lines:
 | 
| 59 | 
 | 
| 60 |     PATH=/bin:/usr/bin          # Literal string
 | 
| 61 |     echo ${x:-default}          # Part of an operator
 | 
| 62 |     echo $(( x > y ? 42 : 0 ))  # Arithmetic Operator
 | 
| 63 |     var myslice = a[3:5]        # YSH expression
 | 
| 64 | 
 | 
| 65 | To solve this problem, YSH has a lexer that can run in many **modes**.
 | 
| 66 | Multiple parsers read from this single lexer, but they demand different tokens,
 | 
| 67 | depending on the parsing context.
 | 
| 68 | 
 | 
| 69 | ### More Information
 | 
| 70 | 
 | 
| 71 | - [How OSH Uses Lexer Modes](//www.oilshell.org/blog/2016/10/19.html)
 | 
| 72 | - [When Are Lexer Modes Useful?](//www.oilshell.org/blog/2017/12/17.html)
 | 
| 73 | - [How to Parse Shell Like a Programming Language](//www.oilshell.org/blog/2019/02/07.html)
 | 
| 74 |   - See the list of 14 lexer modes.
 | 
| 75 | - [Posts tagged #lexing]($blog-tag:lexing)
 | 
| 76 | 
 | 
| 77 | ## Sigils and Sigil Pairs
 | 
| 78 | 
 | 
| 79 | A **sigil** is a symbol like the `$` in `$mystr`.
 | 
| 80 | 
 | 
| 81 | A **sigil pair** is a sigil with opening and closing delimiters, like `${var}`
 | 
| 82 | and `@(seq 3)`.
 | 
| 83 | 
 | 
| 84 | An appendix of [A Feel For YSH Syntax](syntax-feelings.html) lists the sigil
 | 
| 85 | pairs in the YSH language.
 | 
| 86 | 
 | 
| 87 | ### Valid Contexts
 | 
| 88 | 
 | 
| 89 | Each sigil pair may be available in command mode, expression mode, or both.
 | 
| 90 | 
 | 
| 91 | For example, command substitution is available in both:
 | 
| 92 | 
 | 
| 93 |     echo $(hostname)      # command mode
 | 
| 94 |     var x = $(hostname)   # expression mode
 | 
| 95 | 
 | 
| 96 | So are raw and C-style string literals:
 | 
| 97 | 
 | 
| 98 |     echo $'foo\n'  # the bash-compatible way to do it
 | 
| 99 |     var s = $'foo\n'
 | 
| 100 | 
 | 
| 101 |     echo r'c:\Program Files\'
 | 
| 102 |     var raw = r'c:\Program Files\'
 | 
| 103 | 
 | 
| 104 | But array literals only make sense in expression mode:
 | 
| 105 | 
 | 
| 106 |     var myarray = :| one two three |
 | 
| 107 | 
 | 
| 108 |     echo one two three  # no array literal needed
 | 
| 109 | 
 | 
| 110 | A sigil pair often changes the **lexer mode** to parse what's inside.
 | 
| 111 | 
 | 
| 112 | ## Parse Options to Take Over `()`, `[]`, `@`, and `=`
 | 
| 113 | 
 | 
| 114 | Most users don't have to worry about parse options.  Instead, they run either
 | 
| 115 | `bin/osh` or `bin/ysh`, which are actually aliases for the same binary.  The
 | 
| 116 | difference is that `bin/ysh` has the **option group** `ysh:all` on by default.
 | 
| 117 | 
 | 
| 118 | Nonetheless, here are two examples.
 | 
| 119 | 
 | 
| 120 | The `parse_at` option (in group `ysh:upgrade`) turns `@` into the **splice
 | 
| 121 | operator** when it's at the front of a word:
 | 
| 122 | 
 | 
| 123 | ```sh-prompt
 | 
| 124 | $ var myarray = :| one two three |
 | 
| 125 | 
 | 
| 126 | $ echo @myarray         # @ isn't an an operator in shell
 | 
| 127 | @myarray
 | 
| 128 | 
 | 
| 129 | $ shopt -s parse_at     # parse the @ symbol
 | 
| 130 | $ echo @myarray
 | 
| 131 | one two three
 | 
| 132 | 
 | 
| 133 | $ echo '@myarray'       # quote it to get the old behavior
 | 
| 134 | @myarray
 | 
| 135 | ```
 | 
| 136 | 
 | 
| 137 | The `parse_bracket` option (also in group `ysh:upgrade`) lets you pass
 | 
| 138 | unevaluated expressions to a command with `[]`:
 | 
| 139 | 
 | 
| 140 | ```sh-prompt
 | 
| 141 | assert (^[42 === x])   # assert is passed an expression, not value
 | 
| 142 | assert [42 === x]      # syntax sugar with parse_bracket
 | 
| 143 | ```
 | 
| 144 | 
 | 
| 145 | ## Static Parsing
 | 
| 146 | 
 | 
| 147 | POSIX specifies that Unix shell has multiple stages of parsing and evaluation.
 | 
| 148 | For example:
 | 
| 149 | 
 | 
| 150 | ```sh-prompt
 | 
| 151 | $ x=2 
 | 
| 152 | $ code='3 * x'
 | 
| 153 | $ echo $(( code ))  # Silent eval of a string.  Dangerous!
 | 
| 154 | 6
 | 
| 155 | ```
 | 
| 156 | 
 | 
| 157 | YSH expressions are parsed in a single stage, and then evaluated, which makes
 | 
| 158 | it more like Python or JavaScript:
 | 
| 159 | 
 | 
| 160 | ```sh-prompt
 | 
| 161 | $ setvar code = '3 * x'
 | 
| 162 | $ echo $[ code ]
 | 
| 163 | 3 * x
 | 
| 164 | ```
 | 
| 165 | 
 | 
| 166 | Another example: shell assignment builtins like `readonly` and `local`
 | 
| 167 | dynamically parsed, while YSH assignment like `const` and `var` are statically
 | 
| 168 | parsed.
 | 
| 169 | 
 | 
| 170 | ### Aside: Duplicate Functionality in Bash
 | 
| 171 | 
 | 
| 172 | It's confusing that [bash]($xref) has **both** statically- and
 | 
| 173 | dynamically-parsed variants of the same functionality.
 | 
| 174 | 
 | 
| 175 | Boolean expressions:
 | 
| 176 | 
 | 
| 177 | - `[ -d /tmp ]` is dynamically parsed
 | 
| 178 | - `[[ -d /tmp ]]` is statically parsed
 | 
| 179 | 
 | 
| 180 | C-style string literals:
 | 
| 181 | 
 | 
| 182 | - `echo -e '\n'` is dynamically parsed 
 | 
| 183 | - `echo $'\n'` is statically parsed
 | 
| 184 | 
 | 
| 185 | <!--
 | 
| 186 | Remaining dynamic parsing in shell:
 | 
| 187 | 
 | 
| 188 | - printf: `%.3f`
 | 
| 189 | - glob: `*.py'`
 | 
| 190 | - history lexer does another pass ...
 | 
| 191 | -->
 | 
| 192 | 
 | 
| 193 | ### Related Links
 | 
| 194 | 
 | 
| 195 | - [Parsing Bash is Undecidable](//www.oilshell.org/blog/2016/10/20.html)
 | 
| 196 | - [A 30-year-old Security Problem](//www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
 | 
| 197 | - [Comment on Perl and the rc shell](https://lobste.rs/s/7bpgbl/rc_plan_9_shell#c_mokqrn)
 | 
| 198 | 
 | 
| 199 | ## Related Documents
 | 
| 200 | 
 | 
| 201 | - [YSH Language Influences](language-influences.html).  Where the syntax in YSH
 | 
| 202 |   comes from.
 | 
| 203 | 
 | 
| 204 | ## Appendix: Hand-Written vs. Generated Parsers
 | 
| 205 | 
 | 
| 206 | The [OSH]($xref) language is parsed "by hand", while the [YSH]($xref)
 | 
| 207 | expression language is parsed with tables generated from a grammar (a modified
 | 
| 208 | version of [Python's pgen]($xref:pgen2)).
 | 
| 209 | 
 | 
| 210 | This is mostly an implementation detail, but users may notice that OSH gives
 | 
| 211 | more specific error messages!
 | 
| 212 | 
 | 
| 213 | Hand-written parsers give you more control over errors.  Eventually the YSH
 | 
| 214 | language may have a hand-written parser as well.  Either way, feel free to file
 | 
| 215 | bugs about error messages that confuse you.
 | 
| 216 | 
 |