1 | ---
|
2 | in_progress: yes
|
3 | default_highlighter: oils-sh
|
4 | css_files: ../../web/base.css ../../web/manual.css ../../web/toc.css
|
5 | ---
|
6 |
|
7 | Word Language
|
8 | =============
|
9 |
|
10 | Recall that Oil is composed of three interleaved languages: **words**,
|
11 | [commands](command-language.html), and [expressions](expression-language.html).
|
12 |
|
13 | This doc describes words, but only the things that are **not** in:
|
14 |
|
15 | - [A Tour of the Oil Language](oil-language-tour.html)
|
16 | - The `#word-lang` section of [OSH Help
|
17 | Topics](osh-help-topics.html#word-lang)
|
18 | - The `#word-lang` section of [Oil Help
|
19 | Topics](oil-help-topics.html#word-lang)
|
20 |
|
21 | <div id="toc">
|
22 | </div>
|
23 |
|
24 | ## What's a Word?
|
25 |
|
26 | A word is an expression like `$x`, `"hello $name"`, or `{build,test}/*.py`. It
|
27 | evaluates to a string or an array of strings.
|
28 |
|
29 | Generally speaking, Oil behaves like a simpler version of POSIX shell / bash.
|
30 | Sophisticated users can read [Simple Word Evaluation](simple-word-eval.html)
|
31 | for a comparison.
|
32 |
|
33 | ## Contexts Where Words Are Used
|
34 |
|
35 | ### Words Are Part of Expressions and Commands
|
36 |
|
37 | Part of an expression:
|
38 |
|
39 | var x = ${y:-'default'}
|
40 |
|
41 | Part of a command:
|
42 |
|
43 | echo ${y:-'default'}
|
44 |
|
45 | ### Word Sequences: in for loops and array literals
|
46 |
|
47 | The three contexts where splitting and globbing apply are the ones where a
|
48 | **sequence** of words is evaluated (`EvalWordSequence`):
|
49 |
|
50 | 1. [Command]($help:simple-command): `echo $x foo`
|
51 | 2. [For loop]($help:for): `for i in $x foo; do ...`
|
52 | 3. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([oil-array]($help))
|
53 |
|
54 | ### Oil vs. Bash Array Literals
|
55 |
|
56 | Oil has a new array syntax, but it also supports the bash-compatible syntax:
|
57 |
|
58 | ```
|
59 | local myarray=(one two *.py) # bash
|
60 |
|
61 | var myarray = :| one two *.py | # Oil style
|
62 | ```
|
63 |
|
64 | ### Oil Discourages Context-Sensitive Evaluation
|
65 |
|
66 | Shell also has contexts where it evaluates words to a **single string**, rather
|
67 | than a sequence, like:
|
68 |
|
69 | ```sh
|
70 | # RHS of Assignment
|
71 | x="${not_array[@]}"
|
72 | x=*.py # not a glob
|
73 |
|
74 | # Redirect Arg
|
75 | echo foo > "${not_array[@]}"
|
76 | echo foo > *.py # not a glob
|
77 |
|
78 | # Case variables and patterns
|
79 | case "${not_array1[@]}" in
|
80 | "${not_array2[@]}")
|
81 | echo oops
|
82 | ;;
|
83 | esac
|
84 |
|
85 | case *.sh in # not a glob
|
86 | *.py) # a string pattern, not a file system glob
|
87 | echo oops
|
88 | ;;
|
89 | esac
|
90 | ```
|
91 |
|
92 | The behavior of these snippets diverges a lot in existing shells. That is,
|
93 | shells are buggy and poorly-specified.
|
94 |
|
95 | Oil disallows most of them. Arrays are considered separate from strings and
|
96 | don't randomly "decay".
|
97 |
|
98 | Related: the RHS of an Oil assignment is an expression, which can be of any
|
99 | type, including an array:
|
100 |
|
101 | ```
|
102 | var parts = split(x) # returns an array
|
103 | var python = glob('*.py') # ditto
|
104 |
|
105 | var s = join(parts) # returns a string
|
106 | ```
|
107 |
|
108 | ## Sigils
|
109 |
|
110 | This is a recap of [A Feel for Oil's Syntax](syntax-feelings.html).
|
111 |
|
112 | ### `$` Means "Returns One String"
|
113 |
|
114 | Examples:
|
115 |
|
116 | - All substitutions: var, command, arith
|
117 | - TODO: Do we have `$[a[x+1]]` as an expression substitution?
|
118 | - Or `$[ /pat+ /]`?
|
119 | - I don't think so.
|
120 |
|
121 | - Inline function calls, a YSH extension: `$[join(myarray)]`
|
122 |
|
123 | (C-style strings like `$'\n'` use `$`, but that's more of a bash anachronism.
|
124 | In Oil, `c'\n'` is preferred.
|
125 |
|
126 | ### `@` Means "Returns An Array of Strings"
|
127 |
|
128 | Enabled with `shopt -s parse_at`.
|
129 |
|
130 | Examples:
|
131 |
|
132 | - `@myarray`
|
133 | - `@[arrayfunc(x, y)]`
|
134 |
|
135 | These are both Oil extensions.
|
136 |
|
137 | The array literal syntax also uses a `@`:
|
138 |
|
139 | ```
|
140 | var myarray = :| 1 2 3 |
|
141 | ```
|
142 |
|
143 | ## OSH Features
|
144 |
|
145 | ### Word Splitting and Empty String Elision
|
146 |
|
147 | Uses POSIX behavior for unquoted substitutions like `$x`.
|
148 |
|
149 | - The string value is split into args with `$IFS`.
|
150 | - If the string value is empty, no args are produced.
|
151 |
|
152 | ### Implicit Joining
|
153 |
|
154 | Shell has odd "joining" semantics, which are supported in Oil but generally
|
155 | discouraged:
|
156 |
|
157 | set -- 'a b' 'c d'
|
158 | argv.py X"$@"X # => ['Xa', 'b', 'c', 'dX']
|
159 |
|
160 | In Oil, the RHS of an assignment is an expression, and joining only occurs
|
161 | within double quotes:
|
162 |
|
163 | # Oil
|
164 | var joined = $x$y # parse error
|
165 | var joined = "$x$y" # OK
|
166 |
|
167 | # Shell
|
168 | joined=$x$y # OK
|
169 | joined="$x$y" # OK
|
170 |
|
171 | <a name="extended-glob"></a>
|
172 | ### Extended Globs
|
173 |
|
174 | Extended globs in OSH are a "legacy syntax" modelled after the behavior of
|
175 | `bash` and `mksh`. This features adds alternation, repetition, and negation to
|
176 | globs, giving the power of regexes.
|
177 |
|
178 | You can use them to match strings:
|
179 |
|
180 | $ [[ foo.cc == *.(cc|h) ]] && echo 'matches' # => matches
|
181 |
|
182 | Or produce lists of filename arguments:
|
183 |
|
184 | $ touch foo.cc foo.h
|
185 | $ echo *.@(cc|h) # => foo.cc foo.h
|
186 |
|
187 | There are some limitations and differences:
|
188 |
|
189 | - Extended globs are supported only when Oil is built with GNU libc.
|
190 | - GNU libc has the `FNM_EXTMATCH` extension to `fnmatch()`. Unlike bash and
|
191 | mksh, Oil doesn't implement its own extended glob matcher.
|
192 | - They're more **static**, like in `mksh`. When an extended glob appears in a
|
193 | word, we evaluate the word, match filenames, and **skip** the rest of the
|
194 | word evaluation pipeline. This means:
|
195 | - Automatic word splitting is skipped in something like
|
196 | `$unquoted/@(*.cc|h)`.
|
197 | - You can't use arrays like `"$@"` and extended globs in the same word, e.g.
|
198 | `"$@"_*.@(cc|h)`. This is usually nonsensical anyway.
|
199 | - OSH only accepts them in **contexts** that make sense.
|
200 | - For example, `echo foo > @(cc|h)` is a runtime error in OSH, but other
|
201 | shells will write a file literally named `@(cc|h)`.
|
202 | - OSH doesn't accept `${undef:-@(cc)}`. But it does accept `${x%@(cc)}`,
|
203 | since string strip operators like `%` accept a glob.
|
204 | - Extended globbing is always on in OSH, regardless of `shopt -s extglob`.
|
205 | - Trivia: `bash` can't parse some extended globs unless `extglob` is on. But
|
206 | it parses others when it's off.
|
207 | - Extended globs can't be used in the `PATTERN` in `${x//PATTERN/replace}`.
|
208 | This is because we only translate normal (non-extended) globs to regexes (in
|
209 | order to get the position information necessary for string replacement).
|
210 | - They're not supported when `shopt --set simple_word_eval` (Oil word
|
211 | evaluation).
|
212 | - For similar reasons, they're also not supported in assignment builtins.
|
213 | (This is a good thing!)
|