OILS / doc / word-eval.txt View on Github | oilshell.org

270 lines, 155 significant
1Notes on Word Evaluation
2========================
3
4There are a few contexts for word evaluation.
5
6EvalWordSequence:
7
8 echo $s "${a[@]}"
9 declare -a a=( $s "${a[@]}" )
10
11EvalWordToAny
12
13 a="$s ${a[@]}" (Although bash decays this)
14
15EvalWordToString
16
17 echo foo > "$s ${a[@]}" # error because it should be a string
18
19Glossary
20--------
21
22Unevaluated:
23 word
24 word_part
25
26Evaluated:
27
28 part_value
29
30 fragment
31
32 fragment_groups? groups have one-on-one correspondence with words?
33 Every word should be a flat list of fragments?
34
35 frame
36 arg
37
38Schema Changes
39
40 | StringPartValue(string s, bool quoted)
41
42 | fragment(string s, bool quoted)
43
44Or maybe it should just be a flat array of StringPartValue?
45
46Simpler Way?
47--------
48
49If the goal is just to elide $empty and not $empty""
50And you never elide ANYTHING in "${a[@]}" or "$@"
51
52This is just all in one word.
53
54Logic: if the word consists of all StringPartValue which are unquoted and
55IFS splitting gives NOTHING, then omit the whole word? That can come first?
56
57I still need _Reframe : fargment groups into frames
58
59fragment groups are things not separated by a hard barrier. "${a[@]}" has
60internal barriers that can never be broken (they will never be joined)
61
62
63EvalWordSequence
64----------------
65
66Let's talk about the hard case first. EvalWordSequence gets a list of words
67from the parser (and after brace expansion), and it calls _EvalSplitGlob in a
68loop on each word.
69
70So the processing of each word is separate. Each word results in ZERO or more
71argv entries. I'll call each entry an "arg" from now on.
72
73Each CompoundWord is composed of an array of word_part. But this is actually
74a tree, because of cases like this:
75
76 $ argv x${a:-"1 2" "3 4"}x
77
78The ${a} substitution is a word part, but it might expand into an ARRAY of
79word_part:
80
81 [ (DQ '1 2') (LiteralPart ' ') (DQ '3 4') ]
82
83### Step 1: _EvalParts
84
85This evaluates part, and then flattens out the part_value.CompoundPartValue
86instances. So we're left with a flat list of StringPartValue and
87ArrayPartValue. (Could this be encoded in the type system?)
88
89The only way to get an ArrayPartValue is "$@" or "${a[@]}". These are not
90split or globbed, so we don't have to worry about them. We just have to "pass
91them through" unchanged.
92
93### Step 2: FrameFragments
94
95A fragment is either a StringPartValue or once PIECE of an ArrayPartValue.
96
97You need to do the reframing, but preserve whether each StringPartValue is
98quoted. ArrayPartValues are always quoted.
99
100
101So now we have an array of fragments. Should be
102
103[ fragment(s Str, quoted Bool), ... ]
104
105Should we call these Frames?
106
107
108### Elide Frames if IFS has whitespace
109
110These are elided:
111
112 empty=''
113 argv $empty
114 argv ${empty:-}
115
116These are not:
117
118 argv "${empty}"
119 argv ${empty:-''}
120 argv ${empty:-""}
121 argv ""
122 argv $empty"" # joining two parts
123
124
125### Step 3: Maybe Glob Escape Frames
126
127Now go over each frame. If no fragment in the frame is quoted, it's like this:
128
129 "$s"
130 "${a[@]}"
131
132We can just pass these through as
133
134
135
136any fragment in the frame is not quoted, then we
137need to both split it and glob it.
138
139Splitting comes first, and globbing comes second. So we have to eescape in
140the OPPOSITE order.
141
142### Step 4: Maybe IFS Escape Frames
143
144
145### Step 5: Split Frames with IFS
146
147Rules: IFS is split into other. State machine is very complex!
148
149### Step 6: Glob Frames with Globber
150
151Respect options: noglob, failglob, etc.
152
153
154NOTE: globs inside strings are respected!
155
156a='*.py'
157same as:
158a=*.py # no globbing yet
159
160echo $a
161
162
163Moral of the Story
164------------------
165
166This algorithm is horrible! It's almost impossible to reason about, and the
167syntax is bad too. Oil will have something much simpler.
168
169
170
171
172PROBLEMS:
173
174How to avoid eliding ""?
175
176I think if everything is quoted, then we can just
177
178These become fragments.
179
180
181
182
183
184Another Algorithm
185-----------------
186
187FOR EACH WORD
188
1891. Eval - word_part
1902. Flatten - word_part but no CompoundWordPart
191 (note: could combine these two steps if
192 _EvalWordPart(part, quoted=False) had an accumulator argument.
193
194
1953. MakeFrames(word_parts) -- handle StringPartValue and ArrayValue
196
197 a=(1 '2 3' 4)
198 $x"${a[@]}"$y
199
200 This has three frames.
201 The middle frame is just appended.
202 The first and last frame have to undergo splitting.
203
204 This has one frame:
205
206 $empty""
207
208 One frame:
209
210 ${empty:-}
211
212
213 A frame is zero or more args. It will never be joined with anything else.
214
215
216 # fragment is the same as StringPartvalue though.
217
218 fragment = (string s, bool quoted)
219 frame = (fragment* frags)
220
221
2224. Elide certain frames. IFS is whitespace && everything is unquoted and
223everything is IFS
224
225 $a$b
226
227CHOICE:
228
229(A) 5. If everything in the frame is quoted, just join it end to end, and
230 emit it. Skip the next steps.
231
232
233(B) 5. Join fragments in a frame, doing glob escaping and IFS escaping,
234 depending on "quoted" and depending on "noglob".
235
236 6. Split with IFS
237
238 7. Glob, appending to argv.
239
240
241
242POSIX on "$@":
243------------
244
245 Expands to the positional parameters, starting from one. When the expansion
246 occurs within double-quotes, and where field splitting (see Field Splitting)
247 is performed, each positional parameter shall expand as a separate field,
248 with the provision that the expansion of the first parameter shall still be
249 joined with the beginning part of the original word (assuming that the
250 expanded parameter was embedded within a word), and the expansion of the last
251 parameter shall still be joined with the last part of the original word. If
252 there are no positional parameters, the expansion of '@' shall generate zero
253 fields, even when '@' is double-quoted.
254
255
256POSIX on Field Splitting
257-------------------------
258
259 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05
260 https://www.gnu.org/software/bash/manual/bashref.html#Word-Splitting
261
262 Summary:
263 1. ' \t\n' is special. Whitespace is trimmed off the front and back.
264 2. if IFS is '', no field splitting is performed.
265 3. Otherwise, suppose IFS = ' ,\t'. Then IFS whitespace is space or comma.
266 a. IFS whitespace isgnored at beginning and end.
267 b. any other IFS char delimits the field, along with adjacent IFS
268 whitespace.
269 c. IFS whitespace shall delimit a field.
270