| 1 | Notes on Word Evaluation
 | 
| 2 | ========================
 | 
| 3 | 
 | 
| 4 | There are a few contexts for word evaluation.
 | 
| 5 | 
 | 
| 6 | EvalWordSequence:
 | 
| 7 | 
 | 
| 8 |   echo $s "${a[@]}"
 | 
| 9 |   declare -a a=( $s "${a[@]}" )
 | 
| 10 | 
 | 
| 11 | EvalWordToAny
 | 
| 12 | 
 | 
| 13 |   a="$s ${a[@]}"   (Although bash decays this)
 | 
| 14 | 
 | 
| 15 | EvalWordToString
 | 
| 16 | 
 | 
| 17 |   echo foo > "$s ${a[@]}"  # error because it should be a string
 | 
| 18 | 
 | 
| 19 | Glossary
 | 
| 20 | --------
 | 
| 21 | 
 | 
| 22 | Unevaluated:
 | 
| 23 |   word
 | 
| 24 |   word_part
 | 
| 25 | 
 | 
| 26 | Evaluated:
 | 
| 27 | 
 | 
| 28 |   part_value
 | 
| 29 | 
 | 
| 30 |   fragment
 | 
| 31 | 
 | 
| 32 |   fragment_groups?  groups have one-on-one correspondence with words?
 | 
| 33 |                     Every word should be a flat list of fragments?
 | 
| 34 | 
 | 
| 35 |   frame
 | 
| 36 |   arg
 | 
| 37 | 
 | 
| 38 | Schema Changes
 | 
| 39 | 
 | 
| 40 |   | StringPartValue(string s, bool quoted)
 | 
| 41 | 
 | 
| 42 |   | fragment(string s, bool quoted)
 | 
| 43 | 
 | 
| 44 | Or maybe it should just be a flat array of StringPartValue?
 | 
| 45 | 
 | 
| 46 | Simpler Way?
 | 
| 47 | --------
 | 
| 48 | 
 | 
| 49 | If the goal is just to elide $empty and not $empty""
 | 
| 50 | And you never elide ANYTHING in "${a[@]}" or "$@"
 | 
| 51 | 
 | 
| 52 | This is just all in one word.
 | 
| 53 | 
 | 
| 54 | Logic: if the word consists of all StringPartValue which are unquoted and
 | 
| 55 | IFS splitting gives NOTHING, then omit the whole word?  That can come first?
 | 
| 56 | 
 | 
| 57 | I still need _Reframe  : fargment groups into frames
 | 
| 58 | 
 | 
| 59 | fragment groups are things not separated by a hard barrier.  "${a[@]}" has
 | 
| 60 | internal barriers that can never be broken (they will never be joined)
 | 
| 61 | 
 | 
| 62 | 
 | 
| 63 | EvalWordSequence
 | 
| 64 | ----------------
 | 
| 65 | 
 | 
| 66 | Let's talk about the hard case first.  EvalWordSequence gets a list of words
 | 
| 67 | from the parser (and after brace expansion), and it calls _EvalSplitGlob in a
 | 
| 68 | loop on each word.
 | 
| 69 | 
 | 
| 70 | So the processing of each word is separate.  Each word results in ZERO or more
 | 
| 71 | argv entries.  I'll call each entry an "arg" from now on.
 | 
| 72 | 
 | 
| 73 | Each CompoundWord is composed of an array of word_part.  But this is actually
 | 
| 74 | a tree, because of cases like this:
 | 
| 75 | 
 | 
| 76 |     $ argv x${a:-"1 2" "3 4"}x
 | 
| 77 | 
 | 
| 78 | The ${a} substitution is a word part, but it might expand into an ARRAY of
 | 
| 79 | word_part:
 | 
| 80 | 
 | 
| 81 |    [ (DQ '1 2') (LiteralPart ' ') (DQ '3 4') ]
 | 
| 82 | 
 | 
| 83 | ### Step 1: _EvalParts
 | 
| 84 | 
 | 
| 85 | This evaluates part, and then flattens out the part_value.CompoundPartValue
 | 
| 86 | instances.  So we're left with a flat list of StringPartValue and
 | 
| 87 | ArrayPartValue.  (Could this be encoded in the type system?)
 | 
| 88 | 
 | 
| 89 | The only way to get an ArrayPartValue is "$@" or "${a[@]}".  These are not
 | 
| 90 | split or globbed, so we don't have to worry about them.  We just have to "pass
 | 
| 91 | them through" unchanged.
 | 
| 92 | 
 | 
| 93 | ### Step 2: FrameFragments
 | 
| 94 | 
 | 
| 95 | A fragment is either a StringPartValue or once PIECE of an ArrayPartValue.
 | 
| 96 | 
 | 
| 97 | You need to do the reframing, but preserve whether each StringPartValue is
 | 
| 98 | quoted.  ArrayPartValues are always quoted.
 | 
| 99 | 
 | 
| 100 | 
 | 
| 101 | So now we have an array of fragments.  Should be
 | 
| 102 | 
 | 
| 103 | [ fragment(s Str, quoted Bool), ... ]
 | 
| 104 | 
 | 
| 105 | Should we call these Frames?
 | 
| 106 | 
 | 
| 107 | 
 | 
| 108 | ### Elide Frames if IFS has whitespace
 | 
| 109 | 
 | 
| 110 | These are elided:
 | 
| 111 | 
 | 
| 112 |     empty=''
 | 
| 113 |     argv $empty 
 | 
| 114 |     argv ${empty:-} 
 | 
| 115 | 
 | 
| 116 | These are not:
 | 
| 117 | 
 | 
| 118 |     argv "${empty}"
 | 
| 119 |     argv ${empty:-''}
 | 
| 120 |     argv ${empty:-""}
 | 
| 121 |     argv ""
 | 
| 122 |     argv $empty""  # joining two parts
 | 
| 123 | 
 | 
| 124 | 
 | 
| 125 | ### Step 3: Maybe Glob Escape Frames
 | 
| 126 | 
 | 
| 127 | Now go over each frame.  If no fragment in the frame is quoted, it's like this:
 | 
| 128 | 
 | 
| 129 |     "$s"
 | 
| 130 |     "${a[@]}"
 | 
| 131 | 
 | 
| 132 | We can just pass these through as
 | 
| 133 | 
 | 
| 134 | 
 | 
| 135 | 
 | 
| 136 | any fragment in the frame is not quoted, then we
 | 
| 137 | need to both split it and glob it.
 | 
| 138 | 
 | 
| 139 | Splitting comes first, and globbing comes second.  So we have to eescape in
 | 
| 140 | the OPPOSITE order.
 | 
| 141 | 
 | 
| 142 | ### Step 4: Maybe IFS Escape Frames
 | 
| 143 | 
 | 
| 144 | 
 | 
| 145 | ### Step 5: Split Frames with IFS
 | 
| 146 | 
 | 
| 147 | Rules: IFS is split into other.  State machine is very complex!
 | 
| 148 | 
 | 
| 149 | ### Step 6: Glob Frames with Globber
 | 
| 150 | 
 | 
| 151 | Respect options: noglob, failglob, etc.
 | 
| 152 | 
 | 
| 153 | 
 | 
| 154 | NOTE: globs inside strings are respected!
 | 
| 155 | 
 | 
| 156 | a='*.py'
 | 
| 157 | same as:
 | 
| 158 | a=*.py  # no globbing yet
 | 
| 159 | 
 | 
| 160 | echo $a 
 | 
| 161 | 
 | 
| 162 | 
 | 
| 163 | Moral of the Story
 | 
| 164 | ------------------
 | 
| 165 | 
 | 
| 166 | This algorithm is horrible!  It's almost impossible to reason about, and the
 | 
| 167 | syntax is bad too.  Oil will have something much simpler.
 | 
| 168 | 
 | 
| 169 | 
 | 
| 170 | 
 | 
| 171 | 
 | 
| 172 | PROBLEMS:
 | 
| 173 | 
 | 
| 174 | How to avoid eliding ""?
 | 
| 175 | 
 | 
| 176 | I think if everything is quoted, then we can just
 | 
| 177 | 
 | 
| 178 | These become fragments.
 | 
| 179 | 
 | 
| 180 | 
 | 
| 181 | 
 | 
| 182 | 
 | 
| 183 | 
 | 
| 184 | Another Algorithm
 | 
| 185 | -----------------
 | 
| 186 | 
 | 
| 187 | FOR EACH WORD
 | 
| 188 | 
 | 
| 189 | 1. Eval - word_part
 | 
| 190 | 2. Flatten - word_part but no CompoundWordPart
 | 
| 191 |   (note: could combine these two steps if 
 | 
| 192 |   _EvalWordPart(part, quoted=False) had an accumulator argument.
 | 
| 193 | 
 | 
| 194 | 
 | 
| 195 | 3. MakeFrames(word_parts) -- handle StringPartValue and ArrayValue
 | 
| 196 | 
 | 
| 197 |    a=(1 '2 3' 4)
 | 
| 198 |    $x"${a[@]}"$y
 | 
| 199 | 
 | 
| 200 |    This has three frames.
 | 
| 201 |    The middle frame is just appended.
 | 
| 202 |    The first and last frame have to undergo splitting.
 | 
| 203 | 
 | 
| 204 |    This has one frame:
 | 
| 205 | 
 | 
| 206 |    $empty""
 | 
| 207 | 
 | 
| 208 |    One frame:
 | 
| 209 | 
 | 
| 210 |    ${empty:-}
 | 
| 211 | 
 | 
| 212 | 
 | 
| 213 |    A frame is zero or more args.  It will never be joined with anything else.
 | 
| 214 | 
 | 
| 215 | 
 | 
| 216 |    # fragment is the same as StringPartvalue though.
 | 
| 217 | 
 | 
| 218 |    fragment = (string s, bool quoted)
 | 
| 219 |    frame = (fragment* frags)
 | 
| 220 | 
 | 
| 221 |    
 | 
| 222 | 4. Elide certain frames.  IFS is whitespace && everything is unquoted and
 | 
| 223 | everything is IFS
 | 
| 224 | 
 | 
| 225 |   $a$b
 | 
| 226 | 
 | 
| 227 | CHOICE:
 | 
| 228 | 
 | 
| 229 | (A)  5.  If everything in the frame is quoted, just join it end to end, and
 | 
| 230 |   emit it.  Skip the next steps.
 | 
| 231 | 
 | 
| 232 | 
 | 
| 233 | (B)  5. Join fragments in a frame, doing glob escaping and IFS escaping,
 | 
| 234 |   depending on "quoted" and depending on "noglob".
 | 
| 235 | 
 | 
| 236 |   6. Split with IFS
 | 
| 237 | 
 | 
| 238 |   7. Glob, appending to argv.
 | 
| 239 | 
 | 
| 240 | 
 | 
| 241 | 
 | 
| 242 | POSIX on "$@":
 | 
| 243 | ------------
 | 
| 244 | 
 | 
| 245 |   Expands to the positional parameters, starting from one. When the expansion
 | 
| 246 |   occurs within double-quotes, and where field splitting (see Field Splitting)
 | 
| 247 |   is performed, each positional parameter shall expand as a separate field,
 | 
| 248 |   with the provision that the expansion of the first parameter shall still be
 | 
| 249 |   joined with the beginning part of the original word (assuming that the
 | 
| 250 |   expanded parameter was embedded within a word), and the expansion of the last
 | 
| 251 |   parameter shall still be joined with the last part of the original word. If
 | 
| 252 |   there are no positional parameters, the expansion of '@' shall generate zero
 | 
| 253 |   fields, even when '@' is double-quoted.
 | 
| 254 | 
 | 
| 255 | 
 | 
| 256 | POSIX on Field Splitting
 | 
| 257 | -------------------------
 | 
| 258 | 
 | 
| 259 |   http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05
 | 
| 260 |   https://www.gnu.org/software/bash/manual/bashref.html#Word-Splitting
 | 
| 261 | 
 | 
| 262 |   Summary:
 | 
| 263 |   1. ' \t\n' is special.  Whitespace is trimmed off the front and back.
 | 
| 264 |   2. if IFS is '', no field splitting is performed.
 | 
| 265 |   3. Otherwise, suppose IFS = ' ,\t'.  Then IFS whitespace is space or comma.
 | 
| 266 |     a.  IFS whitespace isgnored at beginning and end.
 | 
| 267 |     b. any other IFS char delimits the field, along with adjacent IFS
 | 
| 268 |        whitespace.
 | 
| 269 |     c. IFS whitespace shall delimit a field.
 | 
| 270 | 
 |