1 | Notes on Word Evaluation
|
2 | ========================
|
3 |
|
4 | There are a few contexts for word evaluation.
|
5 |
|
6 | EvalWordSequence:
|
7 |
|
8 | echo $s "${a[@]}"
|
9 | declare -a a=( $s "${a[@]}" )
|
10 |
|
11 | EvalWordToAny
|
12 |
|
13 | a="$s ${a[@]}" (Although bash decays this)
|
14 |
|
15 | EvalWordToString
|
16 |
|
17 | echo foo > "$s ${a[@]}" # error because it should be a string
|
18 |
|
19 | Glossary
|
20 | --------
|
21 |
|
22 | Unevaluated:
|
23 | word
|
24 | word_part
|
25 |
|
26 | Evaluated:
|
27 |
|
28 | part_value
|
29 |
|
30 | fragment
|
31 |
|
32 | fragment_groups? groups have one-on-one correspondence with words?
|
33 | Every word should be a flat list of fragments?
|
34 |
|
35 | frame
|
36 | arg
|
37 |
|
38 | Schema Changes
|
39 |
|
40 | | StringPartValue(string s, bool quoted)
|
41 |
|
42 | | fragment(string s, bool quoted)
|
43 |
|
44 | Or maybe it should just be a flat array of StringPartValue?
|
45 |
|
46 | Simpler Way?
|
47 | --------
|
48 |
|
49 | If the goal is just to elide $empty and not $empty""
|
50 | And you never elide ANYTHING in "${a[@]}" or "$@"
|
51 |
|
52 | This is just all in one word.
|
53 |
|
54 | Logic: if the word consists of all StringPartValue which are unquoted and
|
55 | IFS splitting gives NOTHING, then omit the whole word? That can come first?
|
56 |
|
57 | I still need _Reframe : fargment groups into frames
|
58 |
|
59 | fragment groups are things not separated by a hard barrier. "${a[@]}" has
|
60 | internal barriers that can never be broken (they will never be joined)
|
61 |
|
62 |
|
63 | EvalWordSequence
|
64 | ----------------
|
65 |
|
66 | Let's talk about the hard case first. EvalWordSequence gets a list of words
|
67 | from the parser (and after brace expansion), and it calls _EvalSplitGlob in a
|
68 | loop on each word.
|
69 |
|
70 | So the processing of each word is separate. Each word results in ZERO or more
|
71 | argv entries. I'll call each entry an "arg" from now on.
|
72 |
|
73 | Each CompoundWord is composed of an array of word_part. But this is actually
|
74 | a tree, because of cases like this:
|
75 |
|
76 | $ argv x${a:-"1 2" "3 4"}x
|
77 |
|
78 | The ${a} substitution is a word part, but it might expand into an ARRAY of
|
79 | word_part:
|
80 |
|
81 | [ (DQ '1 2') (LiteralPart ' ') (DQ '3 4') ]
|
82 |
|
83 | ### Step 1: _EvalParts
|
84 |
|
85 | This evaluates part, and then flattens out the part_value.CompoundPartValue
|
86 | instances. So we're left with a flat list of StringPartValue and
|
87 | ArrayPartValue. (Could this be encoded in the type system?)
|
88 |
|
89 | The only way to get an ArrayPartValue is "$@" or "${a[@]}". These are not
|
90 | split or globbed, so we don't have to worry about them. We just have to "pass
|
91 | them through" unchanged.
|
92 |
|
93 | ### Step 2: FrameFragments
|
94 |
|
95 | A fragment is either a StringPartValue or once PIECE of an ArrayPartValue.
|
96 |
|
97 | You need to do the reframing, but preserve whether each StringPartValue is
|
98 | quoted. ArrayPartValues are always quoted.
|
99 |
|
100 |
|
101 | So now we have an array of fragments. Should be
|
102 |
|
103 | [ fragment(s Str, quoted Bool), ... ]
|
104 |
|
105 | Should we call these Frames?
|
106 |
|
107 |
|
108 | ### Elide Frames if IFS has whitespace
|
109 |
|
110 | These are elided:
|
111 |
|
112 | empty=''
|
113 | argv $empty
|
114 | argv ${empty:-}
|
115 |
|
116 | These are not:
|
117 |
|
118 | argv "${empty}"
|
119 | argv ${empty:-''}
|
120 | argv ${empty:-""}
|
121 | argv ""
|
122 | argv $empty"" # joining two parts
|
123 |
|
124 |
|
125 | ### Step 3: Maybe Glob Escape Frames
|
126 |
|
127 | Now go over each frame. If no fragment in the frame is quoted, it's like this:
|
128 |
|
129 | "$s"
|
130 | "${a[@]}"
|
131 |
|
132 | We can just pass these through as
|
133 |
|
134 |
|
135 |
|
136 | any fragment in the frame is not quoted, then we
|
137 | need to both split it and glob it.
|
138 |
|
139 | Splitting comes first, and globbing comes second. So we have to eescape in
|
140 | the OPPOSITE order.
|
141 |
|
142 | ### Step 4: Maybe IFS Escape Frames
|
143 |
|
144 |
|
145 | ### Step 5: Split Frames with IFS
|
146 |
|
147 | Rules: IFS is split into other. State machine is very complex!
|
148 |
|
149 | ### Step 6: Glob Frames with Globber
|
150 |
|
151 | Respect options: noglob, failglob, etc.
|
152 |
|
153 |
|
154 | NOTE: globs inside strings are respected!
|
155 |
|
156 | a='*.py'
|
157 | same as:
|
158 | a=*.py # no globbing yet
|
159 |
|
160 | echo $a
|
161 |
|
162 |
|
163 | Moral of the Story
|
164 | ------------------
|
165 |
|
166 | This algorithm is horrible! It's almost impossible to reason about, and the
|
167 | syntax is bad too. Oil will have something much simpler.
|
168 |
|
169 |
|
170 |
|
171 |
|
172 | PROBLEMS:
|
173 |
|
174 | How to avoid eliding ""?
|
175 |
|
176 | I think if everything is quoted, then we can just
|
177 |
|
178 | These become fragments.
|
179 |
|
180 |
|
181 |
|
182 |
|
183 |
|
184 | Another Algorithm
|
185 | -----------------
|
186 |
|
187 | FOR EACH WORD
|
188 |
|
189 | 1. Eval - word_part
|
190 | 2. Flatten - word_part but no CompoundWordPart
|
191 | (note: could combine these two steps if
|
192 | _EvalWordPart(part, quoted=False) had an accumulator argument.
|
193 |
|
194 |
|
195 | 3. MakeFrames(word_parts) -- handle StringPartValue and ArrayValue
|
196 |
|
197 | a=(1 '2 3' 4)
|
198 | $x"${a[@]}"$y
|
199 |
|
200 | This has three frames.
|
201 | The middle frame is just appended.
|
202 | The first and last frame have to undergo splitting.
|
203 |
|
204 | This has one frame:
|
205 |
|
206 | $empty""
|
207 |
|
208 | One frame:
|
209 |
|
210 | ${empty:-}
|
211 |
|
212 |
|
213 | A frame is zero or more args. It will never be joined with anything else.
|
214 |
|
215 |
|
216 | # fragment is the same as StringPartvalue though.
|
217 |
|
218 | fragment = (string s, bool quoted)
|
219 | frame = (fragment* frags)
|
220 |
|
221 |
|
222 | 4. Elide certain frames. IFS is whitespace && everything is unquoted and
|
223 | everything is IFS
|
224 |
|
225 | $a$b
|
226 |
|
227 | CHOICE:
|
228 |
|
229 | (A) 5. If everything in the frame is quoted, just join it end to end, and
|
230 | emit it. Skip the next steps.
|
231 |
|
232 |
|
233 | (B) 5. Join fragments in a frame, doing glob escaping and IFS escaping,
|
234 | depending on "quoted" and depending on "noglob".
|
235 |
|
236 | 6. Split with IFS
|
237 |
|
238 | 7. Glob, appending to argv.
|
239 |
|
240 |
|
241 |
|
242 | POSIX on "$@":
|
243 | ------------
|
244 |
|
245 | Expands to the positional parameters, starting from one. When the expansion
|
246 | occurs within double-quotes, and where field splitting (see Field Splitting)
|
247 | is performed, each positional parameter shall expand as a separate field,
|
248 | with the provision that the expansion of the first parameter shall still be
|
249 | joined with the beginning part of the original word (assuming that the
|
250 | expanded parameter was embedded within a word), and the expansion of the last
|
251 | parameter shall still be joined with the last part of the original word. If
|
252 | there are no positional parameters, the expansion of '@' shall generate zero
|
253 | fields, even when '@' is double-quoted.
|
254 |
|
255 |
|
256 | POSIX on Field Splitting
|
257 | -------------------------
|
258 |
|
259 | http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05
|
260 | https://www.gnu.org/software/bash/manual/bashref.html#Word-Splitting
|
261 |
|
262 | Summary:
|
263 | 1. ' \t\n' is special. Whitespace is trimmed off the front and back.
|
264 | 2. if IFS is '', no field splitting is performed.
|
265 | 3. Otherwise, suppose IFS = ' ,\t'. Then IFS whitespace is space or comma.
|
266 | a. IFS whitespace isgnored at beginning and end.
|
267 | b. any other IFS char delimits the field, along with adjacent IFS
|
268 | whitespace.
|
269 | c. IFS whitespace shall delimit a field.
|
270 |
|