Why Sponsor Oils? | source | all docs for version 0.22.0 | all versions | oilshell.org
Oils Reference — Chapter YSH Expression Language
This chapter describes the YSH expression language, which includes Egg Expressions.
The =
operator is used with assignment keywords:
var x = 42
setvar x = 43
const y = 'k'
setglobal z = 'g'
The augmented assignment operators are:
+= -= *= /= **= //= %=
&= |= ^= <<= >>=
They are used with setvar
and setglobal
. For example:
setvar x += 2
is the same as:
setvar x = x + 2
Likewise, these are the same:
setglobal a[i] -= 1
setglobal a[i] = a[i] - 1
YSH uses JavaScript-like spellings for these three "atoms":
null # type Null
true false # type Bool
Note: to signify "no value", you may sometimes use an empty string ''
,
instead of null
.
Examples of integer literals:
var decimal = 42
var big = 42_000
var hex = 0x0010_ffff
var octal = 0o755
var binary = 0b0001_0000
Examples of float literals:
var myfloat = 3.14
var f2 = -1.5e-100
Three kinds of unquoted backslash escapes are allowed in expression mode. They match what's available in quoted J8-style strings:
var backslash = \\
var quotes = \' ++ \" # same as u'\'' ++ '"'
var mu = \u{3bc} # same as u'\u{3bc}'
var nul = \y00 # same as b'\y00'
YSH has single and double-quoted strings borrowed from Bourne shell, and C-style strings borrowed from J8 Notation.
Double quoted strings respect $
interpolation:
var dq = "hello $world and $(hostname)"
You can add a $
before the left quote to be explicit: $"x is $x"
rather
than "x is $x"
.
Single quoted strings may be raw:
var s = r'line\n' # raw string means \n is literal, NOT a newline
Or J8 strings with backslash escapes:
var s = u'line\n \u{3bc}' # unicode string means \n is a newline
var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
Both u''
and b''
strings evaluate to the single Str
type. The difference
is that b''
strings allow the \yff
byte escape.
There's no way to express a single quote in raw strings. Use one of the other forms instead:
var sq = "single quote: ' "
var sq = u'single quote: \' '
Sometimes you can omit the r
, e.g. where there are no backslashes and thus no
ambiguity:
echo 'foo'
echo r'foo' # same thing
The u''
and b''
strings are called J8 strings because the syntax in YSH
code matches JSON-like data.
var strU = u'mu = \u{3bc}' # J8 string with escapes
var strB = b'bytes \yff' # J8 string that can express byte strings
More examples:
var myRaw = r'[a-z]\n' # raw strings can be used for regexes (not
# eggexes)
Triple-quoted string literals have leading whitespace stripped on each line. They come in the same variants:
var dq = """
hello $world and $(hostname)
no leading whitespace
"""
var myRaw = r'''
raw string
no leading whitespace
'''
var strU = u'''
string that happens to be unicode \u{3bc}
no leading whitespace
'''
var strB = b'''
string that happens to be bytes \u{3bc} \yff
no leading whitespace
'''
Again, you can omit the r
prefix if there's no backslash, because it's not
ambiguous:
var myRaw = '''
raw string
no leading whitespace
'''
String templates use the same syntax as double-quoted strings:
var mytemplate = ^"name = $name, age = $age"
Related topics:
Lists have a Python-like syntax:
var mylist = ['one', 'two', [42, 43]]
And a shell-like syntax:
var list2 = :| one two |
The shell-like syntax accepts the same syntax as a simple command:
ls $mystr @ARGV *.py {foo,bar}@example.com
# Rather than executing ls, evaluate words into a List
var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
Dicts look like JavaScript.
var d = {
key1: 'value', # key can be unquoted if it looks like a var name
'key2': 42, # or quote it
['key2' ++ suffix]: 43, # bracketed expression
}
Omitting a value means that the corresponding key takes the value of a var of the same name:
ysh$ var x = 42
ysh$ var y = 43
ysh$ var d = {x, y} # values omitted
ysh$ = d
(Dict) {x: 42, y: 43}
A range is a sequence of numbers that can be iterated over:
for i in (0 .. 3) {
echo $i
}
=> 0
=> 1
=> 2
As with slices, the last number isn't included. To iterate from 1 to n, you can use this idiom:
for i in (1 .. n+1) {
echo $i
}
In YSH expressions, we use ^()
to create a Command object:
var myblock = ^(echo $PWD; ls *.txt)
It's more common for Command objects to be created with block arguments, which are not expressions:
cd /tmp {
echo $PWD
ls *.txt
}
An expression literal is an object that holds an unevaluated expression:
var myexpr = ^[1 + 2*3]
YSH operator precedence is identical to Python's operator precedence.
New operators:
++
has the same precedence as +
->
and =>
have the same precedence as .
++
The concatenation operator works on Str
objects:
ysh$ var s = 'hello'
ysh$ var t = s ++ ' world'
ysh$ = t
(Str) "hello world"
and List
objects:
ysh$ var L = ['one', 'two']
ysh$ var M = L ++ ['three', '4']
ysh$ = M
(List) ["one", "two", "three", "4"]
String interpolation can be nicer than ++
:
var t2 = "${s} world" # same as t
Likewise, splicing lists can be nicer:
var M2 = :| @L three 4 | # same as M
YSH has strict equality:
a === b # Python-like, without type conversion
a !== b # negated
And type converting equality:
'3' ~== 3 # True, type conversion
The ~==
operator expects a string as the left operand.
Note that:
3 === 3.0
is false because integers and floats are different types, and
there is no type conversion.3 ~== 3.0
is an error, because the left operand isn't a string.You may want to use explicit int()
and float()
to convert numbers, and then
compare them.
Compare objects for identity with is
:
ysh$ var d = {}
ysh$ var e = d
ysh$ = d is d
(Bool) true
ysh$ = d is {other: 'dict'}
(Bool) false
To negate is
, use is not
(like Python:
ysh$ d is not {other: 'dict'}
(Bool) true
The in
operator tests if a key is in a dictionary:
var d = {k: 42}
if ('k' in d) {
echo yes
} # => yes
Unlike Python, in
doesn't work on Str
and List
instances. This because
those operations take linear time rather than constant time (O(n) rather than
O(1)).
TODO: Use includes() / contains()
methods instead.
The comparison operators apply to integers or floats:
4 < 4 # => false
4 <= 4 # => true
5.0 > 5.0 # => false
5.0 >= 5.0 # => true
Example in context:
if (x < 0) {
echo 'x is negative'
}
The logical operators take boolean operands, and are spelled like Python:
not
and or
Note that they are distinct from ! && ||
, which are part of the command
language.
YSH supports most of the arithmetic operators from Python. Notably, /
and %
differ from Python as they round toward zero, not negative
infinity.
Use + - *
for Int
or Float
addition, subtraction and multiplication. If
any of the operands are Float
s, then the output will also be a Float
.
Use /
and //
for Float
division and Int
division, respectively. /
will always result in a Float
, meanwhile //
will always result in an
Int
.
= 1 / 2 # => (Float) 0.5
= 1 // 2 # => (Int) 0
Use %
to compute the remainder of integer division. The left operand must
be an Int
and the right a positive Int
.
= 1 % 2 # -> (Int) 1
= -4 % 2 # -> (Int) 0
Use **
for exponentiation. The left operand must be an Int
and the right a
positive Int
.
All arithmetic operators may coerce either of their operands from strings to a number, provided those strings are formatted as numbers.
= 10 + '1' # => (Int) 11
Operators like + - * /
will coerce strings to either an Int
or Float
.
However, operators like // ** %
and bit shifts will coerce strings only to
an Int
.
= '1.14' + '2' # => (Float) 3.14
= '1.14' % '2' # Type Error: Left operand is a Str
Bitwise operators are like Python and C:
~ # unary complement
& | ^ # binary and, or, xor
>> << # bit shift
The ternary operator is borrowed from Python:
display = 'yes' if len(s) else 'empty'
Str
objects can be indexed by byte:
ysh$ var s = 'cat'
ysh$ = mystr[1]
(Str) 'a'
ysh$ = mystr[-1] # index from the end
(Str) 't'
List
objects:
ysh$ var mylist = [1, 2, 3]
ysh$ = mylist[2]
(Int) 3
Dict
objects are indexed by string key:
ysh$ var mydict = {'key': 42}
ysh$ = mydict['key']
(Int) 42
The expression mydict.key
is short for mydict['key']
.
(Like JavaScript, but unlike Python.)
Slicing gives you a subsequence of a Str
or List
, like Python.
Negative indices are relative to the end.
A function call expression looks like Python:
ysh$ = f('s', 't', named=42)
A semicolon ;
can be used after positional args and before named args, but
isn't always required:
ysh$ = f('s', 't'; named=42)
In these cases, the ;
is necessary:
ysh$ = f(...args; ...kwargs)
ysh$ = f(42, 43; ...kwargs)
The thin arrow is for mutating methods:
var mylist = ['bar']
call mylist->pop()
The fat arrow is for transforming methods:
if (s => startsWith('prefix')) {
echo 'yes'
}
If the method lookup on s
fails, it looks for free functions. This means it
can be used for "chaining" transformations:
var x = myFunc() => list() => join()
YSH has four pattern matching operators: ~ !~ ~~ !~~
.
Does string match an eggex?
var filename = 'x42.py'
if (filename ~ / d+ /) {
echo 'number'
}
Does a string match a POSIX regular expression (ERE syntax)?
if (filename ~ '[[:digit:]]+') {
echo 'number'
}
Negate the result with the !~
operator:
if (filename !~ /space/ ) {
echo 'no space'
}
if (filename !~ '[[:space:]]' ) {
echo 'no space'
}
Does a string match a glob?
if (filename ~~ '*.py') {
echo 'Python'
}
if (filename !~~ '*.py') {
echo 'not Python'
}
Take care not to confuse glob patterns and regular expressions.
An eggex literal looks like this:
/ expression ; flags ; translation preference /
The flags and translation preference are both optional.
Examples:
var pat = / d+ / # => [[:digit:]]+
You can specify flags passed to libc regcomp()
:
var pat = / d+ ; reg_icase reg_newline /
You can specify a translation preference after a second semi-colon:
var pat = / d+ ; ; ERE /
Right now the translation preference does nothing. It could be used to translate eggex to PCRE or Python syntax.
There are two kinds of eggex primitives.
"Zero-width assertions" match a position rather than a character:
%start # translates to ^
%end # translates to $
Literal characters appear within single quotes:
'oh *really*' # translates to regex-escaped string
Double-quoted strings are not eggex primitives. Instead, you can use splicing of strings:
var dq = "hi $name"
var eggex = / @dq /
An eggex character class literal specifies a set. It can have individual characters and ranges:
[ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
Omit quotes on ASCII characters:
[ x y z ] # avoid typing 'x' 'y' 'z'
Sets of characters can be written as strings
[ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
Backslash escapes are respected:
[ \\ \' \" \0 ]
[ \xFF \u{3bc} ]
(Note that we don't use \yFF
, as in J8 strings.)
Splicing:
[ @str_var ]
Negation always uses !
![ a-f A-F 'xyz' @str_var ]
Perl-like shortcuts for sets of characters:
[ dot ] # => .
[ digit ] # => [[:digit:]]
[ space ] # => [[:space:]]
[ word ] # => [[:alpha:]][[:digit:]]_
Abbreviations:
[ d s w ] # Same as [ digit space word ]
Valid POSIX classes:
alnum cntrl lower space
alpha digit print upper
blank graph punct xdigit
Negated:
!digit !space !word
!d !s !w
!alnum # etc.
Eggex repetition looks like POSIX syntax:
/ 'a'? / # zero or one
/ 'a'* / # zero or more
/ 'a'+ / # one or more
Counted repetitions:
/ 'a'{3} / # exactly 3 repetitions
/ 'a'{2,4} / # between 2 to 4 repetitions
Sequence expressions with a space:
/ word digit digit / # Matches 3 characters in sequence
# Examples: a42, b51
(Compare / [ word digit ] /
, which is a set matching 1 character.)
Alternation with |
:
/ word | digit / # Matches 'a' OR '9', for example
Grouping with parentheses:
/ (word digit) | \\ / # Matches a9 or \
To retrieve a substring of a string that matches an Eggex, use a "capture
group" like <capture ...>
.
Here's an eggex with a positional capture:
var pat = / 'hi ' <capture d+> / # access with _group(1)
# or Match => _group(1)
Captures can be named:
<capture d+ as month> # access with _group('month')
# or Match => group('month')
Captures can also have a type conversion func:
<capture d+ : int> # _group(1) returns Int
<capture d+ as month: int> # _group('month') returns Int
Related docs and help topics:
To build an eggex out of smaller expressions, you can splice eggexes together:
var D = / [0-9][0-9] /
var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
If the variable begins with a capital letter, you can omit @
:
var ip = / D ':' D /
You can also splice a string:
var greeting = 'hi'
var pat = / @greeting ' world' / # hi world
Splicing is not string concatenation; it works on eggex subtrees.
Valid ERE flags, which are passed to libc's regcomp()
:
reg_icase
aka i
- ignore casereg_newline
- 4 matching changes related to newlinesSee man regcomp
.
Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
var Name = / <capture [a-z]+ as name> /
var Num = / <capture d+ as num> /
var Space = / <capture s+ as space> /
# For variables named like CapWords, splicing @Name doesn't require @
var lexer = / Name | Num | Space /