1 | ---
2 | in_progress: yes
3 | default_highlighter: oils-sh
4 | css_files: ../../web/base.css ../../web/manual.css ../../web/toc.css
5 | ---
6 |
7 | Word Language
8 | =============
9 |
10 | Recall that Oil is composed of three interleaved languages: **words**,
11 | [commands](command-language.html), and [expressions](expression-language.html).
12 |
13 | This doc describes words, but only the things that are **not** in:
14 |
15 | - [A Tour of the Oil Language](oil-language-tour.html)
16 | - The `#word-lang` section of [OSH Help
17 | Topics](osh-help-topics.html#word-lang)
18 | - The `#word-lang` section of [Oil Help
19 | Topics](oil-help-topics.html#word-lang)
20 |
21 | <div id="toc">
22 | </div>
23 |
24 | ## What's a Word?
25 |
26 | A word is an expression like `$x`, `"hello $name"`, or `{build,test}/*.py`. It
27 | evaluates to a string or an array of strings.
28 |
29 | Generally speaking, Oil behaves like a simpler version of POSIX shell / bash.
30 | Sophisticated users can read [Simple Word Evaluation](simple-word-eval.html)
31 | for a comparison.
32 |
33 | ## Contexts Where Words Are Used
34 |
35 | ### Words Are Part of Expressions and Commands
36 |
37 | Part of an expression:
38 |
39 | var x = ${y:-'default'}
40 |
41 | Part of a command:
42 |
43 | echo ${y:-'default'}
44 |
45 | ### Word Sequences: in for loops and array literals
46 |
47 | The three contexts where splitting and globbing apply are the ones where a
48 | **sequence** of words is evaluated (`EvalWordSequence`):
49 |
50 | 1. [Command]($help:simple-command): `echo $x foo`
51 | 2. [For loop]($help:for): `for i in $x foo; do ...`
52 | 3. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([oil-array]($help))
53 |
54 | ### Oil vs. Bash Array Literals
55 |
56 | Oil has a new array syntax, but it also supports the bash-compatible syntax:
57 |
58 | ```
59 | local myarray=(one two *.py) # bash
60 |
61 | var myarray = :| one two *.py | # Oil style
62 | ```
63 |
64 | ### Oil Discourages Context-Sensitive Evaluation
65 |
66 | Shell also has contexts where it evaluates words to a **single string**, rather
67 | than a sequence, like:
68 |
69 | ```sh
70 | # RHS of Assignment
71 | x="${not_array[@]}"
72 | x=*.py # not a glob
73 |
74 | # Redirect Arg
75 | echo foo > "${not_array[@]}"
76 | echo foo > *.py # not a glob
77 |
78 | # Case variables and patterns
79 | case "${not_array1[@]}" in
80 | "${not_array2[@]}")
81 | echo oops
82 | ;;
83 | esac
84 |
85 | case *.sh in # not a glob
86 | *.py) # a string pattern, not a file system glob
87 | echo oops
88 | ;;
89 | esac
90 | ```
91 |
92 | The behavior of these snippets diverges a lot in existing shells. That is,
93 | shells are buggy and poorly-specified.
94 |
95 | Oil disallows most of them. Arrays are considered separate from strings and
96 | don't randomly "decay".
97 |
98 | Related: the RHS of an Oil assignment is an expression, which can be of any
99 | type, including an array:
100 |
101 | ```
102 | var parts = split(x) # returns an array
103 | var python = glob('*.py') # ditto
104 |
105 | var s = join(parts) # returns a string
106 | ```
107 |
108 | ## Sigils
109 |
110 | This is a recap of [A Feel for Oil's Syntax](syntax-feelings.html).
111 |
112 | ### `$` Means "Returns One String"
113 |
114 | Examples:
115 |
116 | - All substitutions: var, command, arith
117 | - TODO: Do we have `$[a[x+1]]` as an expression substitution?
118 | - Or `$[ /pat+ /]`?
119 | - I don't think so.
120 |
121 | - Inline function calls, a YSH extension: `$[join(myarray)]`
122 |
123 | (C-style strings like `$'\n'` use `$`, but that's more of a bash anachronism.
124 | In Oil, `c'\n'` is preferred.
125 |
126 | ### `@` Means "Returns An Array of Strings"
127 |
128 | Enabled with `shopt -s parse_at`.
129 |
130 | Examples:
131 |
132 | - `@myarray`
133 | - `@[arrayfunc(x, y)]`
134 |
135 | These are both Oil extensions.
136 |
137 | The array literal syntax also uses a `@`:
138 |
139 | ```
140 | var myarray = :| 1 2 3 |
141 | ```
142 |
143 | ## OSH Features
144 |
145 | ### Word Splitting and Empty String Elision
146 |
147 | Uses POSIX behavior for unquoted substitutions like `$x`.
148 |
149 | - The string value is split into args with `$IFS`.
150 | - If the string value is empty, no args are produced.
151 |
152 | ### Implicit Joining
153 |
154 | Shell has odd "joining" semantics, which are supported in Oil but generally
155 | discouraged:
156 |
157 | set -- 'a b' 'c d'
158 | argv.py X"$@"X # => ['Xa', 'b', 'c', 'dX']
159 |
160 | In Oil, the RHS of an assignment is an expression, and joining only occurs
161 | within double quotes:
162 |
163 | # Oil
164 | var joined = $x$y # parse error
165 | var joined = "$x$y" # OK
166 |
167 | # Shell
168 | joined=$x$y # OK
169 | joined="$x$y" # OK
170 |
171 | <a name="extended-glob"></a>
172 | ### Extended Globs
173 |
174 | Extended globs in OSH are a "legacy syntax" modelled after the behavior of
175 | `bash` and `mksh`. This features adds alternation, repetition, and negation to
176 | globs, giving the power of regexes.
177 |
178 | You can use them to match strings:
179 |
180 | $ [[ foo.cc == *.(cc|h) ]] && echo 'matches' # => matches
181 |
182 | Or produce lists of filename arguments:
183 |
184 | $ touch foo.cc foo.h
185 | $ echo *.@(cc|h) # => foo.cc foo.h
186 |
187 | There are some limitations and differences:
188 |
189 | - Extended globs are supported only when Oil is built with GNU libc.
190 | - GNU libc has the `FNM_EXTMATCH` extension to `fnmatch()`. Unlike bash and
191 | mksh, Oil doesn't implement its own extended glob matcher.
192 | - They're more **static**, like in `mksh`. When an extended glob appears in a
193 | word, we evaluate the word, match filenames, and **skip** the rest of the
194 | word evaluation pipeline. This means:
195 | - Automatic word splitting is skipped in something like
196 | `$unquoted/@(*.cc|h)`.
197 | - You can't use arrays like `"$@"` and extended globs in the same word, e.g.
198 | `"$@"_*.@(cc|h)`. This is usually nonsensical anyway.
199 | - OSH only accepts them in **contexts** that make sense.
200 | - For example, `echo foo > @(cc|h)` is a runtime error in OSH, but other
201 | shells will write a file literally named `@(cc|h)`.
202 | - OSH doesn't accept `${undef:-@(cc)}`. But it does accept `${x%@(cc)}`,
203 | since string strip operators like `%` accept a glob.
204 | - Extended globbing is always on in OSH, regardless of `shopt -s extglob`.
205 | - Trivia: `bash` can't parse some extended globs unless `extglob` is on. But
206 | it parses others when it's off.
207 | - Extended globs can't be used in the `PATTERN` in `${x//PATTERN/replace}`.
208 | This is because we only translate normal (non-extended) globs to regexes (in
209 | order to get the position information necessary for string replacement).
210 | - They're not supported when `shopt --set simple_word_eval` (Oil word
211 | evaluation).
212 | - For similar reasons, they're also not supported in assignment builtins.
213 | (This is a good thing!)