| 1 | ---
 | 
| 2 | title: YSH Expression Language (Oils Reference)
 | 
| 3 | all_docs_url: ..
 | 
| 4 | body_css_class: width40
 | 
| 5 | default_highlighter: oils-sh
 | 
| 6 | preserve_anchor_case: yes
 | 
| 7 | ---
 | 
| 8 | 
 | 
| 9 | <div class="doc-ref-header">
 | 
| 10 | 
 | 
| 11 | [Oils Reference](index.html) —
 | 
| 12 | Chapter **YSH Expression Language**
 | 
| 13 | 
 | 
| 14 | </div>
 | 
| 15 | 
 | 
| 16 | This chapter describes the YSH expression language, which includes [Egg
 | 
| 17 | Expressions]($xref:eggex).
 | 
| 18 | 
 | 
| 19 | <div id="dense-toc">
 | 
| 20 | </div>
 | 
| 21 | 
 | 
| 22 | ## Assignment
 | 
| 23 | 
 | 
| 24 | ### assign
 | 
| 25 | 
 | 
| 26 | The `=` operator is used with assignment keywords:
 | 
| 27 | 
 | 
| 28 |     var x = 42
 | 
| 29 |     setvar x = 43
 | 
| 30 | 
 | 
| 31 |     const y = 'k'
 | 
| 32 | 
 | 
| 33 |     setglobal z = 'g'
 | 
| 34 | 
 | 
| 35 | ### aug-assign
 | 
| 36 | 
 | 
| 37 | The augmented assignment operators are:
 | 
| 38 | 
 | 
| 39 |     +=   -=   *=   /=   **=   //=   %=
 | 
| 40 |     &=   |=   ^=   <<=   >>=
 | 
| 41 | 
 | 
| 42 | They are used with `setvar` and `setglobal`.  For example:
 | 
| 43 | 
 | 
| 44 |     setvar x += 2
 | 
| 45 | 
 | 
| 46 | is the same as:
 | 
| 47 | 
 | 
| 48 |     setvar x = x + 2
 | 
| 49 | 
 | 
| 50 | Likewise, these are the same:
 | 
| 51 | 
 | 
| 52 |     setglobal a[i] -= 1
 | 
| 53 | 
 | 
| 54 |     setglobal a[i] = a[i] - 1
 | 
| 55 | 
 | 
| 56 | ## Literals
 | 
| 57 | 
 | 
| 58 | ### atom-literal
 | 
| 59 | 
 | 
| 60 | YSH uses JavaScript-like spellings for these three "atoms":
 | 
| 61 | 
 | 
| 62 |     null           # type Null
 | 
| 63 |     true   false   # type Bool
 | 
| 64 | 
 | 
| 65 | Note: to signify "no value", you may sometimes use an empty string `''`,
 | 
| 66 | instead of `null`.
 | 
| 67 | 
 | 
| 68 | ### int-literal
 | 
| 69 | 
 | 
| 70 | Examples of integer literals:
 | 
| 71 | 
 | 
| 72 |     var decimal = 42
 | 
| 73 |     var big = 42_000
 | 
| 74 | 
 | 
| 75 |     var hex = 0x0010_ffff
 | 
| 76 | 
 | 
| 77 |     var octal = 0o755
 | 
| 78 | 
 | 
| 79 |     var binary = 0b0001_0000
 | 
| 80 | 
 | 
| 81 | ### float-lit
 | 
| 82 | 
 | 
| 83 | Examples of float literals:
 | 
| 84 | 
 | 
| 85 |     var myfloat = 3.14
 | 
| 86 | 
 | 
| 87 |     var f2 = -1.5e-100
 | 
| 88 | 
 | 
| 89 | ### char-literal
 | 
| 90 | 
 | 
| 91 | Three kinds of unquoted backslash escapes are allowed in expression mode.  They
 | 
| 92 | match what's available in quoted J8-style strings:
 | 
| 93 | 
 | 
| 94 |     var backslash = \\
 | 
| 95 |     var quotes = \' ++ \"   # same as u'\'' ++ '"'
 | 
| 96 | 
 | 
| 97 |     var mu = \u{3bc}        # same as u'\u{3bc}'
 | 
| 98 | 
 | 
| 99 |     var nul = \y00          # same as b'\y00'
 | 
| 100 | 
 | 
| 101 | ### ysh-string
 | 
| 102 | 
 | 
| 103 | YSH has single and double-quoted strings borrowed from Bourne shell, and
 | 
| 104 | C-style strings borrowed from J8 Notation.
 | 
| 105 | 
 | 
| 106 | Double quoted strings respect `$` interpolation:
 | 
| 107 | 
 | 
| 108 |     var dq = "hello $world and $(hostname)"
 | 
| 109 | 
 | 
| 110 | You can add a `$` before the left quote to be explicit: `$"x is $x"` rather
 | 
| 111 | than `"x is $x"`.
 | 
| 112 | 
 | 
| 113 | Single quoted strings may be raw:
 | 
| 114 | 
 | 
| 115 |     var s = r'line\n'      # raw string means \n is literal, NOT a newline
 | 
| 116 | 
 | 
| 117 | Or *J8 strings* with backslash escapes:
 | 
| 118 | 
 | 
| 119 |     var s = u'line\n \u{3bc}'        # unicode string means \n is a newline
 | 
| 120 |     var s = b'line\n \u{3bc} \yff'   # same thing, but also allows bytes
 | 
| 121 | 
 | 
| 122 | Both `u''` and `b''` strings evaluate to the single `Str` type.  The difference
 | 
| 123 | is that `b''` strings allow the `\yff` byte escape.
 | 
| 124 | 
 | 
| 125 | #### Notes
 | 
| 126 | 
 | 
| 127 | There's no way to express a single quote in raw strings.  Use one of the other
 | 
| 128 | forms instead:
 | 
| 129 | 
 | 
| 130 |     var sq = "single quote: ' "
 | 
| 131 |     var sq = u'single quote: \' '
 | 
| 132 | 
 | 
| 133 | Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
 | 
| 134 | ambiguity:
 | 
| 135 | 
 | 
| 136 |     echo 'foo'
 | 
| 137 |     echo r'foo'  # same thing
 | 
| 138 | 
 | 
| 139 | The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
 | 
| 140 | **code** matches JSON-like **data**.
 | 
| 141 | 
 | 
| 142 |     var strU = u'mu = \u{3bc}'  # J8 string with escapes
 | 
| 143 |     var strB = b'bytes \yff'    # J8 string that can express byte strings
 | 
| 144 | 
 | 
| 145 | More examples:
 | 
| 146 | 
 | 
| 147 |     var myRaw = r'[a-z]\n'      # raw strings can be used for regexes (not
 | 
| 148 |                                 # eggexes)
 | 
| 149 | 
 | 
| 150 | ### triple-quoted
 | 
| 151 | 
 | 
| 152 | Triple-quoted string literals have leading whitespace stripped on each line.
 | 
| 153 | They come in the same variants:
 | 
| 154 | 
 | 
| 155 |     var dq = """
 | 
| 156 |         hello $world and $(hostname)
 | 
| 157 |         no leading whitespace
 | 
| 158 |         """
 | 
| 159 | 
 | 
| 160 |     var myRaw = r'''
 | 
| 161 |         raw string
 | 
| 162 |         no leading whitespace
 | 
| 163 |         '''
 | 
| 164 | 
 | 
| 165 |     var strU = u'''
 | 
| 166 |         string that happens to be unicode \u{3bc}
 | 
| 167 |         no leading whitespace
 | 
| 168 |         '''
 | 
| 169 | 
 | 
| 170 |     var strB = b'''
 | 
| 171 |         string that happens to be bytes \u{3bc} \yff
 | 
| 172 |         no leading whitespace
 | 
| 173 |         '''
 | 
| 174 | 
 | 
| 175 | Again, you can omit the `r` prefix if there's no backslash, because it's not
 | 
| 176 | ambiguous:
 | 
| 177 | 
 | 
| 178 |     var myRaw = '''
 | 
| 179 |         raw string
 | 
| 180 |         no leading whitespace
 | 
| 181 |         '''
 | 
| 182 | 
 | 
| 183 | ### str-template
 | 
| 184 | 
 | 
| 185 | String templates use the same syntax as double-quoted strings:
 | 
| 186 | 
 | 
| 187 |     var mytemplate = ^"name = $name, age = $age"
 | 
| 188 | 
 | 
| 189 | Related topics:
 | 
| 190 | 
 | 
| 191 | - [Str => replace](chap-type-method.html#replace)
 | 
| 192 | - [ysh-string](chap-expr-lang.html#ysh-string)
 | 
| 193 | 
 | 
| 194 | ### list-literal
 | 
| 195 | 
 | 
| 196 | Lists have a Python-like syntax:
 | 
| 197 | 
 | 
| 198 |     var mylist = ['one', 'two', [42, 43]]
 | 
| 199 | 
 | 
| 200 | And a shell-like syntax:
 | 
| 201 | 
 | 
| 202 |     var list2 = :| one two |
 | 
| 203 | 
 | 
| 204 | The shell-like syntax accepts the same syntax as a simple command:
 | 
| 205 | 
 | 
| 206 |     ls $mystr @ARGV *.py {foo,bar}@example.com
 | 
| 207 | 
 | 
| 208 |     # Rather than executing ls, evaluate words into a List
 | 
| 209 |     var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
 | 
| 210 | 
 | 
| 211 | ### dict-literal
 | 
| 212 | 
 | 
| 213 | Dicts look like JavaScript.
 | 
| 214 | 
 | 
| 215 |     var d = {
 | 
| 216 |       key1: 'value',  # key can be unquoted if it looks like a var name
 | 
| 217 |       'key2': 42,     # or quote it
 | 
| 218 | 
 | 
| 219 |       ['key2' ++ suffix]: 43,   # bracketed expression
 | 
| 220 |     }
 | 
| 221 | 
 | 
| 222 | Omitting a value means that the corresponding key takes the value of a var of
 | 
| 223 | the same name:
 | 
| 224 | 
 | 
| 225 |     ysh$ var x = 42
 | 
| 226 |     ysh$ var y = 43
 | 
| 227 | 
 | 
| 228 |     ysh$ var d = {x, y}  # values omitted
 | 
| 229 |     ysh$ = d
 | 
| 230 |     (Dict)  {x: 42, y: 43}
 | 
| 231 | 
 | 
| 232 | ### range
 | 
| 233 | 
 | 
| 234 | A range is a sequence of numbers that can be iterated over:
 | 
| 235 | 
 | 
| 236 |     for i in (0 .. 3) {
 | 
| 237 |       echo $i
 | 
| 238 |     }
 | 
| 239 |     => 0
 | 
| 240 |     => 1
 | 
| 241 |     => 2
 | 
| 242 | 
 | 
| 243 | As with slices, the last number isn't included.  To iterate from 1 to n, you
 | 
| 244 | can use this idiom:
 | 
| 245 | 
 | 
| 246 |     for i in (1 .. n+1) {
 | 
| 247 |       echo $i
 | 
| 248 |     }
 | 
| 249 | 
 | 
| 250 | ### block-expr
 | 
| 251 | 
 | 
| 252 | In YSH expressions, we use `^()` to create a [Command][] object:
 | 
| 253 | 
 | 
| 254 |     var myblock = ^(echo $PWD; ls *.txt)
 | 
| 255 | 
 | 
| 256 | It's more common for [Command][] objects to be created with block arguments,
 | 
| 257 | which are not expressions:
 | 
| 258 | 
 | 
| 259 |     cd /tmp {
 | 
| 260 |       echo $PWD
 | 
| 261 |       ls *.txt
 | 
| 262 |     }
 | 
| 263 | 
 | 
| 264 | [Command]: chap-type-method.html#Command
 | 
| 265 | 
 | 
| 266 | ### expr-literal
 | 
| 267 | 
 | 
| 268 | An expression literal is an object that holds an unevaluated expression:
 | 
| 269 | 
 | 
| 270 |     var myexpr = ^[1 + 2*3]
 | 
| 271 | 
 | 
| 272 | [Expr]: chap-type-method.html#Expr
 | 
| 273 | 
 | 
| 274 | ## Operators
 | 
| 275 | 
 | 
| 276 | ### op-precedence
 | 
| 277 | 
 | 
| 278 | YSH operator precedence is identical to Python's operator precedence.
 | 
| 279 | 
 | 
| 280 | New operators:
 | 
| 281 | 
 | 
| 282 | - `++` has the same precedence as `+`
 | 
| 283 | - `->` and `=>` have the same precedence as `.`
 | 
| 284 | 
 | 
| 285 | <!-- TODO: show grammar -->
 | 
| 286 | 
 | 
| 287 | 
 | 
| 288 | <h3 id="concat">concat <code>++</code></h3>
 | 
| 289 | 
 | 
| 290 | The concatenation operator works on `Str` objects:
 | 
| 291 | 
 | 
| 292 |     ysh$ var s = 'hello'
 | 
| 293 |     ysh$ var t = s ++ ' world'
 | 
| 294 | 
 | 
| 295 |     ysh$ = t
 | 
| 296 |     (Str)   "hello world"
 | 
| 297 | 
 | 
| 298 | and `List` objects:
 | 
| 299 | 
 | 
| 300 |     ysh$ var L = ['one', 'two']
 | 
| 301 |     ysh$ var M = L ++ ['three', '4']
 | 
| 302 | 
 | 
| 303 |     ysh$ = M
 | 
| 304 |     (List)   ["one", "two", "three", "4"]
 | 
| 305 | 
 | 
| 306 | String interpolation can be nicer than `++`:
 | 
| 307 | 
 | 
| 308 |     var t2 = "${s} world"  # same as t
 | 
| 309 | 
 | 
| 310 | Likewise, splicing lists can be nicer:
 | 
| 311 | 
 | 
| 312 |     var M2 = :| @L three 4 |  # same as M
 | 
| 313 | 
 | 
| 314 | ### ysh-equals
 | 
| 315 | 
 | 
| 316 | YSH has strict equality:
 | 
| 317 | 
 | 
| 318 |     a === b       # Python-like, without type conversion
 | 
| 319 |     a !== b       # negated
 | 
| 320 | 
 | 
| 321 | And type converting equality:
 | 
| 322 | 
 | 
| 323 |     '3' ~== 3     # True, type conversion
 | 
| 324 | 
 | 
| 325 | The `~==` operator expects a string as the left operand.
 | 
| 326 | 
 | 
| 327 | ---
 | 
| 328 | 
 | 
| 329 | Note that:
 | 
| 330 | 
 | 
| 331 | - `3 === 3.0` is false because integers and floats are different types, and
 | 
| 332 |   there is no type conversion.
 | 
| 333 | - `3 ~== 3.0` is an error, because the left operand isn't a string.
 | 
| 334 | 
 | 
| 335 | You may want to use explicit `int()` and `float()` to convert numbers, and then
 | 
| 336 | compare them.
 | 
| 337 | 
 | 
| 338 | ---
 | 
| 339 | 
 | 
| 340 | Compare objects for identity with `is`:
 | 
| 341 | 
 | 
| 342 |     ysh$ var d = {}    
 | 
| 343 |     ysh$ var e = d
 | 
| 344 | 
 | 
| 345 |     ysh$ = d is d
 | 
| 346 |     (Bool)   true
 | 
| 347 | 
 | 
| 348 |     ysh$ = d is {other: 'dict'}
 | 
| 349 |     (Bool)   false
 | 
| 350 | 
 | 
| 351 | To negate `is`, use `is not` (like Python:
 | 
| 352 | 
 | 
| 353 |     ysh$ d is not {other: 'dict'}
 | 
| 354 |     (Bool)   true
 | 
| 355 | 
 | 
| 356 | ### ysh-in
 | 
| 357 | 
 | 
| 358 | The `in` operator tests if a key is in a dictionary:
 | 
| 359 | 
 | 
| 360 |     var d = {k: 42}
 | 
| 361 |     if ('k' in d) {
 | 
| 362 |       echo yes
 | 
| 363 |     }  # => yes
 | 
| 364 | 
 | 
| 365 | Unlike Python, `in` doesn't work on `Str` and `List` instances.  This because
 | 
| 366 | those operations take linear time rather than constant time (O(n) rather than
 | 
| 367 | O(1)).
 | 
| 368 | 
 | 
| 369 | TODO: Use `includes() / contains()` methods instead.
 | 
| 370 | 
 | 
| 371 | ### ysh-compare
 | 
| 372 | 
 | 
| 373 | The comparison operators apply to integers or floats:
 | 
| 374 | 
 | 
| 375 |     4 < 4   # => false
 | 
| 376 |     4 <= 4  # => true
 | 
| 377 | 
 | 
| 378 |     5.0 > 5.0   # => false
 | 
| 379 |     5.0 >= 5.0  # => true
 | 
| 380 | 
 | 
| 381 | Example in context:
 | 
| 382 | 
 | 
| 383 |     if (x < 0) {
 | 
| 384 |       echo 'x is negative'
 | 
| 385 |     }
 | 
| 386 | 
 | 
| 387 | ### ysh-logical
 | 
| 388 | 
 | 
| 389 | The logical operators take boolean operands, and are spelled like Python:
 | 
| 390 | 
 | 
| 391 |     not
 | 
| 392 |     and  or
 | 
| 393 | 
 | 
| 394 | Note that they are distinct from `!  &&  ||`, which are part of the [command
 | 
| 395 | language](chap-cmd-lang.html).
 | 
| 396 | 
 | 
| 397 | ### ysh-arith
 | 
| 398 | 
 | 
| 399 | YSH supports most of the arithmetic operators from Python. Notably, `/` and `%`
 | 
| 400 | differ from Python as [they round toward zero, not negative
 | 
| 401 | infinity](https://www.oilshell.org/blog/2024/03/release-0.21.0.html#integers-dont-do-whatever-python-or-c-does).
 | 
| 402 | 
 | 
| 403 | Use `+ - *` for `Int` or `Float` addition, subtraction and multiplication. If
 | 
| 404 | any of the operands are `Float`s, then the output will also be a `Float`.
 | 
| 405 | 
 | 
| 406 | Use `/` and `//` for `Float` division and `Int` division, respectively. `/`
 | 
| 407 | will _always_ result in a `Float`, meanwhile `//` will _always_ result in an
 | 
| 408 | `Int`.
 | 
| 409 | 
 | 
| 410 |     = 1 / 2   # => (Float) 0.5
 | 
| 411 |     = 1 // 2  # => (Int) 0
 | 
| 412 | 
 | 
| 413 | Use `%` to compute the _remainder_ of integer division. The left operand must
 | 
| 414 | be an `Int` and the right a _positive_ `Int`.
 | 
| 415 | 
 | 
| 416 |     = 1 % 2   # -> (Int) 1
 | 
| 417 |     = -4 % 2  # -> (Int) 0
 | 
| 418 | 
 | 
| 419 | Use `**` for exponentiation. The left operand must be an `Int` and the right a
 | 
| 420 | _positive_ `Int`.
 | 
| 421 | 
 | 
| 422 | All arithmetic operators may coerce either of their operands from strings to a
 | 
| 423 | number, provided those strings are formatted as numbers.
 | 
| 424 | 
 | 
| 425 |     = 10 + '1'  # => (Int) 11
 | 
| 426 | 
 | 
| 427 | Operators like `+ - * /` will coerce strings to _either_ an `Int` or `Float`.
 | 
| 428 | However, operators like `// ** %` and bit shifts will coerce strings _only_ to
 | 
| 429 | an `Int`.
 | 
| 430 | 
 | 
| 431 |     = '1.14' + '2'  # => (Float) 3.14
 | 
| 432 |     = '1.14' % '2'  # Type Error: Left operand is a Str
 | 
| 433 | 
 | 
| 434 | ### ysh-bitwise
 | 
| 435 | 
 | 
| 436 | Bitwise operators are like Python and C:
 | 
| 437 | 
 | 
| 438 |     ~        # unary complement
 | 
| 439 | 
 | 
| 440 |     &  |  ^  # binary and, or, xor
 | 
| 441 | 
 | 
| 442 |     >>  <<   # bit shift
 | 
| 443 | 
 | 
| 444 | ### ysh-ternary
 | 
| 445 | 
 | 
| 446 | The ternary operator is borrowed from Python:
 | 
| 447 | 
 | 
| 448 |     display = 'yes' if len(s) else 'empty'
 | 
| 449 | 
 | 
| 450 | ### ysh-index
 | 
| 451 | 
 | 
| 452 | `Str` objects can be indexed by byte:
 | 
| 453 | 
 | 
| 454 |     ysh$ var s = 'cat'
 | 
| 455 |     ysh$ = mystr[1]
 | 
| 456 |     (Str)   'a'  
 | 
| 457 | 
 | 
| 458 |     ysh$ = mystr[-1]  # index from the end
 | 
| 459 |     (Str)   't'
 | 
| 460 | 
 | 
| 461 | `List` objects:
 | 
| 462 | 
 | 
| 463 |     ysh$ var mylist = [1, 2, 3]
 | 
| 464 |     ysh$ = mylist[2]
 | 
| 465 |     (Int)  3
 | 
| 466 | 
 | 
| 467 | `Dict` objects are indexed by string key:
 | 
| 468 | 
 | 
| 469 |     ysh$ var mydict = {'key': 42}
 | 
| 470 |     ysh$ = mydict['key']
 | 
| 471 |     (Int)  42
 | 
| 472 | 
 | 
| 473 | ### ysh-attr
 | 
| 474 | 
 | 
| 475 | The expression `mydict.key` is short for `mydict['key']`.
 | 
| 476 | 
 | 
| 477 | (Like JavaScript, but unlike Python.)
 | 
| 478 | 
 | 
| 479 | ### ysh-slice
 | 
| 480 | 
 | 
| 481 | Slicing gives you a subsequence of a `Str` or `List`, as in Python.
 | 
| 482 | 
 | 
| 483 | Negative indices are relative to the end.
 | 
| 484 | 
 | 
| 485 | String example:
 | 
| 486 | 
 | 
| 487 |     $ var s = 'spam eggs'
 | 
| 488 |     $ pp line (s[1:-1])
 | 
| 489 |     (Str)   "pam egg"
 | 
| 490 | 
 | 
| 491 |     $ echo "x $[s[2:]]"
 | 
| 492 |     x am eggs
 | 
| 493 | 
 | 
| 494 | List example:
 | 
| 495 | 
 | 
| 496 |     $ var foods = ['ale', 'bean', 'corn']
 | 
| 497 |     $ pp line (foods[-2:])
 | 
| 498 |     (List)   ["bean","corn"]
 | 
| 499 |     
 | 
| 500 |     $ write -- @[foods[:2]]
 | 
| 501 |     ale
 | 
| 502 |     bean
 | 
| 503 | 
 | 
| 504 | ### func-call
 | 
| 505 | 
 | 
| 506 | A function call expression looks like Python:
 | 
| 507 | 
 | 
| 508 |     ysh$ = f('s', 't', named=42)
 | 
| 509 | 
 | 
| 510 | A semicolon `;` can be used after positional args and before named args, but
 | 
| 511 | isn't always required:
 | 
| 512 | 
 | 
| 513 |     ysh$ = f('s', 't'; named=42)
 | 
| 514 | 
 | 
| 515 | In these cases, the `;` is necessary:
 | 
| 516 | 
 | 
| 517 |     ysh$ = f(...args; ...kwargs)
 | 
| 518 | 
 | 
| 519 |     ysh$ = f(42, 43; ...kwargs)
 | 
| 520 | 
 | 
| 521 | ### thin-arrow
 | 
| 522 | 
 | 
| 523 | The thin arrow is for mutating methods:
 | 
| 524 | 
 | 
| 525 |     var mylist = ['bar']
 | 
| 526 |     call mylist->pop()
 | 
| 527 | 
 | 
| 528 | <!--
 | 
| 529 | TODO
 | 
| 530 |     var mydict = {name: 'foo'}
 | 
| 531 |     call mydict->erase('name')
 | 
| 532 | -->
 | 
| 533 | 
 | 
| 534 | ### fat-arrow
 | 
| 535 | 
 | 
| 536 | The fat arrow is for transforming methods:
 | 
| 537 | 
 | 
| 538 |     if (s => startsWith('prefix')) {
 | 
| 539 |       echo 'yes'
 | 
| 540 |     }
 | 
| 541 | 
 | 
| 542 | If the method lookup on `s` fails, it looks for free functions.  This means it
 | 
| 543 | can be used for "chaining" transformations:
 | 
| 544 | 
 | 
| 545 |     var x = myFunc() => list() => join()
 | 
| 546 | 
 | 
| 547 | ### match-ops
 | 
| 548 | 
 | 
| 549 | YSH has four pattern matching operators: `~   !~   ~~   !~~`.
 | 
| 550 | 
 | 
| 551 | Does string match an **eggex**?
 | 
| 552 | 
 | 
| 553 |     var filename = 'x42.py'
 | 
| 554 |     if (filename ~ / d+ /) {
 | 
| 555 |       echo 'number'
 | 
| 556 |     }
 | 
| 557 | 
 | 
| 558 | Does a string match a POSIX regular expression (ERE syntax)?
 | 
| 559 | 
 | 
| 560 |     if (filename ~ '[[:digit:]]+') {
 | 
| 561 |       echo 'number'
 | 
| 562 |     }
 | 
| 563 | 
 | 
| 564 | Negate the result with the `!~` operator:
 | 
| 565 | 
 | 
| 566 |     if (filename !~ /space/ ) {
 | 
| 567 |       echo 'no space'
 | 
| 568 |     }
 | 
| 569 | 
 | 
| 570 |     if (filename !~ '[[:space:]]' ) {
 | 
| 571 |       echo 'no space'
 | 
| 572 |     }
 | 
| 573 | 
 | 
| 574 | Does a string match a **glob**?
 | 
| 575 | 
 | 
| 576 |     if (filename ~~ '*.py') {
 | 
| 577 |       echo 'Python'
 | 
| 578 |     }
 | 
| 579 | 
 | 
| 580 |     if (filename !~~ '*.py') {
 | 
| 581 |       echo 'not Python'
 | 
| 582 |     }
 | 
| 583 | 
 | 
| 584 | Take care not to confuse glob patterns and regular expressions.
 | 
| 585 | 
 | 
| 586 | - Related doc: [YSH Regex API](../ysh-regex-api.html)
 | 
| 587 | 
 | 
| 588 | ## Eggex
 | 
| 589 | 
 | 
| 590 | ### re-literal
 | 
| 591 | 
 | 
| 592 | An eggex literal looks like this:
 | 
| 593 | 
 | 
| 594 |     / expression ; flags ; translation preference /
 | 
| 595 | 
 | 
| 596 | The flags and translation preference are both optional.
 | 
| 597 | 
 | 
| 598 | Examples:
 | 
| 599 | 
 | 
| 600 |     var pat = / d+ /  # => [[:digit:]]+
 | 
| 601 | 
 | 
| 602 | You can specify flags passed to libc `regcomp()`:
 | 
| 603 | 
 | 
| 604 |     var pat = / d+ ; reg_icase reg_newline / 
 | 
| 605 | 
 | 
| 606 | You can specify a translation preference after a second semi-colon:
 | 
| 607 | 
 | 
| 608 |     var pat = / d+ ; ; ERE / 
 | 
| 609 | 
 | 
| 610 | Right now the translation preference does nothing.  It could be used to
 | 
| 611 | translate eggex to PCRE or Python syntax.
 | 
| 612 | 
 | 
| 613 | - Related doc: [Egg Expressions](../eggex.html)
 | 
| 614 | 
 | 
| 615 | ### re-primitive
 | 
| 616 | 
 | 
| 617 | There are two kinds of eggex primitives.
 | 
| 618 | 
 | 
| 619 | "Zero-width assertions" match a position rather than a character:
 | 
| 620 | 
 | 
| 621 |     %start           # translates to ^
 | 
| 622 |     %end             # translates to $
 | 
| 623 | 
 | 
| 624 | Literal characters appear within **single** quotes:
 | 
| 625 | 
 | 
| 626 |     'oh *really*'    # translates to regex-escaped string
 | 
| 627 | 
 | 
| 628 | Double-quoted strings are **not** eggex primitives.  Instead, you can use
 | 
| 629 | splicing of strings:
 | 
| 630 | 
 | 
| 631 |     var dq = "hi $name"    
 | 
| 632 |     var eggex = / @dq /
 | 
| 633 | 
 | 
| 634 | ### class-literal
 | 
| 635 | 
 | 
| 636 | An eggex character class literal specifies a set.  It can have individual
 | 
| 637 | characters and ranges:
 | 
| 638 | 
 | 
| 639 |     [ 'x' 'y' 'z' a-f A-F 0-9 ]  # 3 chars, 3 ranges
 | 
| 640 | 
 | 
| 641 | Omit quotes on ASCII characters:
 | 
| 642 | 
 | 
| 643 |     [ x y z ]  # avoid typing 'x' 'y' 'z'
 | 
| 644 | 
 | 
| 645 | Sets of characters can be written as strings
 | 
| 646 | 
 | 
| 647 |     [ 'xyz' ]  # any of 3 chars, not a sequence of 3 chars
 | 
| 648 | 
 | 
| 649 | Backslash escapes are respected:
 | 
| 650 | 
 | 
| 651 |     [ \\ \' \" \0 ]
 | 
| 652 |     [ \xFF \u{3bc} ]
 | 
| 653 | 
 | 
| 654 | (Note that we don't use `\yFF`, as in J8 strings.)
 | 
| 655 | 
 | 
| 656 | Splicing:
 | 
| 657 | 
 | 
| 658 |     [ @str_var ]
 | 
| 659 | 
 | 
| 660 | Negation always uses `!`
 | 
| 661 | 
 | 
| 662 |     ![ a-f A-F 'xyz' @str_var ]
 | 
| 663 | 
 | 
| 664 | ### named-class
 | 
| 665 | 
 | 
| 666 | Perl-like shortcuts for sets of characters:
 | 
| 667 | 
 | 
| 668 |     [ dot ]    # => .
 | 
| 669 |     [ digit ]  # => [[:digit:]]
 | 
| 670 |     [ space ]  # => [[:space:]]
 | 
| 671 |     [ word ]   # => [[:alpha:]][[:digit:]]_
 | 
| 672 | 
 | 
| 673 | Abbreviations:
 | 
| 674 | 
 | 
| 675 |     [ d s w ]  # Same as [ digit space word ]
 | 
| 676 | 
 | 
| 677 | Valid POSIX classes:
 | 
| 678 | 
 | 
| 679 |     alnum   cntrl   lower   space
 | 
| 680 |     alpha   digit   print   upper
 | 
| 681 |     blank   graph   punct   xdigit
 | 
| 682 | 
 | 
| 683 | Negated:
 | 
| 684 | 
 | 
| 685 |     !digit   !space   !word
 | 
| 686 |     !d   !s   !w
 | 
| 687 |     !alnum  # etc.
 | 
| 688 | 
 | 
| 689 | ### re-repeat
 | 
| 690 | 
 | 
| 691 | Eggex repetition looks like POSIX syntax:
 | 
| 692 | 
 | 
| 693 |     / 'a'? /      # zero or one
 | 
| 694 |     / 'a'* /      # zero or more
 | 
| 695 |     / 'a'+ /      # one or more
 | 
| 696 | 
 | 
| 697 | Counted repetitions:
 | 
| 698 | 
 | 
| 699 |     / 'a'{3} /    # exactly 3 repetitions
 | 
| 700 |     / 'a'{2,4} /  # between 2 to 4 repetitions
 | 
| 701 | 
 | 
| 702 | ### re-compound
 | 
| 703 | 
 | 
| 704 | Sequence expressions with a space:
 | 
| 705 | 
 | 
| 706 |     / word digit digit /   # Matches 3 characters in sequence
 | 
| 707 |                            # Examples: a42, b51
 | 
| 708 | 
 | 
| 709 | (Compare `/ [ word digit ] /`, which is a set matching 1 character.)
 | 
| 710 | 
 | 
| 711 | Alternation with `|`:
 | 
| 712 | 
 | 
| 713 |     / word | digit /       # Matches 'a' OR '9', for example
 | 
| 714 | 
 | 
| 715 | Grouping with parentheses:
 | 
| 716 | 
 | 
| 717 |     / (word digit) | \\ /  # Matches a9 or \
 | 
| 718 | 
 | 
| 719 | ### re-capture
 | 
| 720 | 
 | 
| 721 | To retrieve a substring of a string that matches an Eggex, use a "capture
 | 
| 722 | group" like `<capture ...>`.
 | 
| 723 | 
 | 
| 724 | Here's an eggex with a **positional** capture:
 | 
| 725 | 
 | 
| 726 |     var pat = / 'hi ' <capture d+> /  # access with _group(1)
 | 
| 727 |                                       # or Match => _group(1)
 | 
| 728 | 
 | 
| 729 | Captures can be **named**:
 | 
| 730 | 
 | 
| 731 |     <capture d+ as month>       # access with _group('month')
 | 
| 732 |                                 # or Match => group('month')
 | 
| 733 | 
 | 
| 734 | Captures can also have a type **conversion func**:
 | 
| 735 | 
 | 
| 736 |     <capture d+ : int>          # _group(1) returns Int
 | 
| 737 | 
 | 
| 738 |     <capture d+ as month: int>  # _group('month') returns Int
 | 
| 739 | 
 | 
| 740 | Related docs and help topics:
 | 
| 741 | 
 | 
| 742 | - [YSH Regex API](../ysh-regex-api.html)
 | 
| 743 | - [`_group()`](chap-builtin-func.html#_group)
 | 
| 744 | - [`Match => group()`](chap-type-method.html#group)
 | 
| 745 | 
 | 
| 746 | ### re-splice
 | 
| 747 | 
 | 
| 748 | To build an eggex out of smaller expressions, you can **splice** eggexes
 | 
| 749 | together:
 | 
| 750 | 
 | 
| 751 |     var D = / [0-9][0-9] /
 | 
| 752 |     var time = / @D ':' @D /  # [0-9][0-9]:[0-9][0-9]
 | 
| 753 | 
 | 
| 754 | If the variable begins with a capital letter, you can omit `@`:
 | 
| 755 | 
 | 
| 756 |     var ip = / D ':' D /
 | 
| 757 | 
 | 
| 758 | You can also splice a string:
 | 
| 759 | 
 | 
| 760 |     var greeting = 'hi'
 | 
| 761 |     var pat = / @greeting ' world' /  # hi world
 | 
| 762 | 
 | 
| 763 | Splicing is **not** string concatenation; it works on eggex subtrees.
 | 
| 764 | 
 | 
| 765 | ### re-flags
 | 
| 766 | 
 | 
| 767 | Valid ERE flags, which are passed to libc's `regcomp()`:
 | 
| 768 | 
 | 
| 769 | - `reg_icase` aka `i` - ignore case
 | 
| 770 | - `reg_newline` - 4 matching changes related to newlines
 | 
| 771 | 
 | 
| 772 | See `man regcomp`.
 | 
| 773 | 
 | 
| 774 | ### re-multiline
 | 
| 775 | 
 | 
| 776 | Multi-line eggexes aren't yet implemented.  Splicing makes it less necessary:
 | 
| 777 | 
 | 
| 778 |     var Name  = / <capture [a-z]+ as name> /
 | 
| 779 |     var Num   = / <capture d+ as num> /
 | 
| 780 |     var Space = / <capture s+ as space> /
 | 
| 781 | 
 | 
| 782 |     # For variables named like CapWords, splicing @Name doesn't require @
 | 
| 783 |     var lexer = / Name | Num | Space /
 |