1 | ---
|
2 | default_highlighter: oils-sh
|
3 | ---
|
4 |
|
5 | Syntactic Concepts in YSH
|
6 | =========================
|
7 |
|
8 | These documents introduce the YSH language:
|
9 |
|
10 | - [The Simplest Explanation of Oil](//www.oilshell.org/blog/2020/01/simplest-explanation.html) (blog)
|
11 | - [A Feel For YSH Syntax](syntax-feelings.html)
|
12 |
|
13 | In contrast, the concepts introduced below may help advanced users **remember**
|
14 | YSH and its syntax. Read on to learn about:
|
15 |
|
16 | - **Command vs. Expression Mode**. Command mode is like shell, and expression
|
17 | mode is like Python.
|
18 | - **Lexer Modes** help parse different "sublanguages" or dialects.
|
19 | - **Sigils and Sigil Pairs**. A style of syntax that's familiar to shell and
|
20 | Perl users.
|
21 | - **Parse Options** like `shopt -s parse_paren`. To selectively break
|
22 | compatibility, and gradually upgrade shell to YSH.
|
23 | - **Static Parsing**, as opposed to the dynamic parsing of shell. Static
|
24 | parsing improves error messages and makes more software engineering tools
|
25 | possible.
|
26 |
|
27 | <!-- TODO: We should talk about word lists: commands, array literals, and for
|
28 | loops -->
|
29 |
|
30 | <div id="toc">
|
31 | </div>
|
32 |
|
33 | ## Command vs. Expression Mode
|
34 |
|
35 | The YSH parser starts out in command mode:
|
36 |
|
37 | echo "hello $name"
|
38 |
|
39 | for i in 1 2 3 {
|
40 | echo $i
|
41 | }
|
42 |
|
43 | But it switches to expression mode in a few places:
|
44 |
|
45 | var x = 42 + a[i] # the RHS of = is a YSH expression
|
46 |
|
47 | echo $[mydict['key']] # interpolated expressions with $[]
|
48 |
|
49 | json write ({key: "val"}) # typed args inside ()
|
50 |
|
51 | See [Command vs. Expression Mode](command-vs-expression-mode.html) for details.
|
52 |
|
53 | ## Lexer Modes
|
54 |
|
55 | *Lexer modes* are a technique that YSH uses to manage the complex syntax of
|
56 | shell, which evolved over many decades.
|
57 |
|
58 | For example, `:` means something different in each of these lines:
|
59 |
|
60 | PATH=/bin:/usr/bin # Literal string
|
61 | echo ${x:-default} # Part of an operator
|
62 | echo $(( x > y ? 42 : 0 )) # Arithmetic Operator
|
63 | var myslice = a[3:5] # YSH expression
|
64 |
|
65 | To solve this problem, YSH has a lexer that can run in many **modes**.
|
66 | Multiple parsers read from this single lexer, but they demand different tokens,
|
67 | depending on the parsing context.
|
68 |
|
69 | ### More Information
|
70 |
|
71 | - [How OSH Uses Lexer Modes](//www.oilshell.org/blog/2016/10/19.html)
|
72 | - [When Are Lexer Modes Useful?](//www.oilshell.org/blog/2017/12/17.html)
|
73 | - [How to Parse Shell Like a Programming Language](//www.oilshell.org/blog/2019/02/07.html)
|
74 | - See the list of 14 lexer modes.
|
75 | - [Posts tagged #lexing]($blog-tag:lexing)
|
76 |
|
77 | ## Sigils and Sigil Pairs
|
78 |
|
79 | A **sigil** is a symbol like the `$` in `$mystr`.
|
80 |
|
81 | A **sigil pair** is a sigil with opening and closing delimiters, like `${var}`
|
82 | and `@(seq 3)`.
|
83 |
|
84 | An appendix of [A Feel For YSH Syntax](syntax-feelings.html) lists the sigil
|
85 | pairs in the YSH language.
|
86 |
|
87 | ### Valid Contexts
|
88 |
|
89 | Each sigil pair may be available in command mode, expression mode, or both.
|
90 |
|
91 | For example, command substitution is available in both:
|
92 |
|
93 | echo $(hostname) # command mode
|
94 | var x = $(hostname) # expression mode
|
95 |
|
96 | So are raw and C-style string literals:
|
97 |
|
98 | echo $'foo\n' # the bash-compatible way to do it
|
99 | var s = $'foo\n'
|
100 |
|
101 | echo r'c:\Program Files\'
|
102 | var raw = r'c:\Program Files\'
|
103 |
|
104 | But array literals only make sense in expression mode:
|
105 |
|
106 | var myarray = :| one two three |
|
107 |
|
108 | echo one two three # no array literal needed
|
109 |
|
110 | A sigil pair often changes the **lexer mode** to parse what's inside.
|
111 |
|
112 | ## Parse Options to Take Over `()`, `[]`, `@`, and `=`
|
113 |
|
114 | Most users don't have to worry about parse options. Instead, they run either
|
115 | `bin/osh` or `bin/ysh`, which are actually aliases for the same binary. The
|
116 | difference is that `bin/ysh` has the **option group** `ysh:all` on by default.
|
117 |
|
118 | Nonetheless, here are two examples.
|
119 |
|
120 | The `parse_at` option (in group `ysh:upgrade`) turns `@` into the **splice
|
121 | operator** when it's at the front of a word:
|
122 |
|
123 | ```sh-prompt
|
124 | $ var myarray = :| one two three |
|
125 |
|
126 | $ echo @myarray # @ isn't an an operator in shell
|
127 | @myarray
|
128 |
|
129 | $ shopt -s parse_at # parse the @ symbol
|
130 | $ echo @myarray
|
131 | one two three
|
132 |
|
133 | $ echo '@myarray' # quote it to get the old behavior
|
134 | @myarray
|
135 | ```
|
136 |
|
137 | The `parse_bracket` option (also in group `ysh:upgrade`) lets you pass
|
138 | unevaluated expressions to a command with `[]`:
|
139 |
|
140 | ```sh-prompt
|
141 | assert (^[42 === x]) # assert is passed an expression, not value
|
142 | assert [42 === x] # syntax sugar with parse_bracket
|
143 | ```
|
144 |
|
145 | ## Static Parsing
|
146 |
|
147 | POSIX specifies that Unix shell has multiple stages of parsing and evaluation.
|
148 | For example:
|
149 |
|
150 | ```sh-prompt
|
151 | $ x=2
|
152 | $ code='3 * x'
|
153 | $ echo $(( code )) # Silent eval of a string. Dangerous!
|
154 | 6
|
155 | ```
|
156 |
|
157 | YSH expressions are parsed in a single stage, and then evaluated, which makes
|
158 | it more like Python or JavaScript:
|
159 |
|
160 | ```sh-prompt
|
161 | $ setvar code = '3 * x'
|
162 | $ echo $[ code ]
|
163 | 3 * x
|
164 | ```
|
165 |
|
166 | Another example: shell assignment builtins like `readonly` and `local`
|
167 | dynamically parsed, while YSH assignment like `const` and `var` are statically
|
168 | parsed.
|
169 |
|
170 | ### Aside: Duplicate Functionality in Bash
|
171 |
|
172 | It's confusing that [bash]($xref) has **both** statically- and
|
173 | dynamically-parsed variants of the same functionality.
|
174 |
|
175 | Boolean expressions:
|
176 |
|
177 | - `[ -d /tmp ]` is dynamically parsed
|
178 | - `[[ -d /tmp ]]` is statically parsed
|
179 |
|
180 | C-style string literals:
|
181 |
|
182 | - `echo -e '\n'` is dynamically parsed
|
183 | - `echo $'\n'` is statically parsed
|
184 |
|
185 | <!--
|
186 | Remaining dynamic parsing in shell:
|
187 |
|
188 | - printf: `%.3f`
|
189 | - glob: `*.py'`
|
190 | - history lexer does another pass ...
|
191 | -->
|
192 |
|
193 | ### Related Links
|
194 |
|
195 | - [Parsing Bash is Undecidable](//www.oilshell.org/blog/2016/10/20.html)
|
196 | - [A 30-year-old Security Problem](//www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
|
197 | - [Comment on Perl and the rc shell](https://lobste.rs/s/7bpgbl/rc_plan_9_shell#c_mokqrn)
|
198 |
|
199 | ## Related Documents
|
200 |
|
201 | - [YSH Language Influences](language-influences.html). Where the syntax in YSH
|
202 | comes from.
|
203 |
|
204 | ## Appendix: Hand-Written vs. Generated Parsers
|
205 |
|
206 | The [OSH]($xref) language is parsed "by hand", while the [YSH]($xref)
|
207 | expression language is parsed with tables generated from a grammar (a modified
|
208 | version of [Python's pgen]($xref:pgen2)).
|
209 |
|
210 | This is mostly an implementation detail, but users may notice that OSH gives
|
211 | more specific error messages!
|
212 |
|
213 | Hand-written parsers give you more control over errors. Eventually the YSH
|
214 | language may have a hand-written parser as well. Either way, feel free to file
|
215 | bugs about error messages that confuse you.
|
216 |
|