Why Sponsor Oils? | source | all docs for version 0.22.0 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
(July 2024)
This is a long, "unified/orthogonal" design for:
There's also a relation to:
jq
, which will be covered elsewhere.It's a layered design. That means we need some underlying mechanisms:
eval
and positional args $1 $2 $3
ctx
builtinIt will link to:
Let's introduce this with a text file
$ seq 4 | xargs -n 2 | tee test.txt
1 2
3 4
xargs does splitting:
$ echo 'alice bob' | xargs -n 1 -- echo hi | tee test2.txt
hi alice
hi bob
Oils:
# should we use $_ for _word _line _row? $[_.age] instead of $[_row.age]
$ echo 'alice bob' | each-word { echo "hi $_" } | tee test2.txt
hi alice
hi bob
Normally this should be balanced
Now let's use awk:
$ cat test.txt | awk '{ print $2 " " $1 }'
2 1
4 3
In YSH:
$ cat test.txt | chop '$2 $1'
2 1
4 3
It's shorter! chop
is an alias for split-by (space=true, template='$2 $1')
With a template, for static parsing:
$ cat test.txt | chop (^"$2 $1")
2 1
4 3
It's shorter! With a block:
$ cat test.txt | chop { mkdir -v -p $2/$1 }
mkdir: created directory '2/1'
mkdir: created directory '4/3'
With no argument, it prints a table:
$ cat test.txt | chop
#.tsv8 $1 $2
2 1
4 3
$ cat test.txt | chop (names = :|a b|)
#.tsv8 a b
2 1
4 3
Longer examples with split-by:
$ cat test.txt | split-by (space=true, template='$2 $1')
$ cat test.txt | split-by (space=true, template=^"$2 $1")
$ cat test.txt | split-by (space=true) { mkdir -v -p $2/$1 }
$ cat test.txt | split-by (space=true)
$ cat test.txt | split-by (space=true, names= :|a b|)
$ cat test.txt | split-by (space=true, names= :|a b|) {
mkdir -v -p $a/$b
}
With must-match:
$ var p = /<capture d+> s+ </capture d+>/
$ cat test.txt | must-match (p, template='$2 $1')
$ cat test.txt | must-match (p, template=^"$2 $1")
$ cat test.txt | must-match (p) { mkdir -v -p $2/$1 }
$ cat test.txt | must-match (p)
With names:
$ var p = /<capture d+ as a> s+ </capture d+ as b>/
$ cat test.txt | must-match (p, template='$b $a')
$ cat test.txt | must-match (p)
#.tsv8 a b
2 1
4 3
$ cat test.txt | must-match (p) {
mkdir -v -p $a/$b
}
Doing it in parallel:
$ cat test.txt | must-match --max-jobs 4 (p) {
mkdir -v -p $a/$b
}
$ cat table.txt size path 3 foo.txt 20 bar.jpg
$ R
t=read.table('table.txt', header=T) t size path 1 3 foo.txt 2 20 bar.jpg
We already saw this! Because we "compressed" awk and xargs together
What's not in the streams / awk example above:
BEGIN END
- that can be separatewhen [$1 ~ /d+/] { }
Shell, Awk, and Make Should be Combined (2016)
What is a Data Frame? (2018)
Sketches of YSH Features (June 2023) - can we express things in YSH?
Language Compositionality Test: J8 Lines
read --split
What is a Data Frame?
jq in jq thread
Old wiki pages:
We're doing all of these.
table
with the ctx
builtinread --split
feedbackfind . -printf '%s %P\n'
- size and path
[{bytes: 123, path: "foo"}, {}, ...]
jq
blocks value.Block
- ^()
and { }
expressions value.Expr
- ^[]
and 'compute [] where []'
eval (b, vars={}, positional=[])
Buffered for loop
for x in (stdin)
"magic awk loop"
with chop { for <README.md *.py> { echo _line_num _line _filename $1 $2 } }
positional args $1 $2 $3
ctx
builtin
value.Place
TODO:
split() like Python, not like shell IFS algorithm
string formatting ${bytes %.2f}
${bytes %.2f M} Megabytes
${bytes %.2f Mi} Mebibytes
${timestamp +'%Y-m-%d'} and strfitime
this is for
floating point %e %f %g and printf and strftime
This means we consider all these conversions
Design might seem very general, but we did make some hard choices.
push vs. pull
buffered vs. unbuffered, everything
List vs iterators
THESE ARE ALL THE SAME ALGORITHM. They just have different names.
should we also have: if-split-by ? In case there aren't enough columns?
They all take:
value.Expr
for the block arg, this applies:
-j 4
--max-jobs 4
--max-jobs $(cached-nproc)
--max-jobs $[_nproc - 1]
So we have this
echo begin
var d = {}
cat -- @files | split-by (ifs=IFS) {
echo $2 $1
call d->accum($1, $2)
}
echo end
But then how do we have conditionals:
Filter foo { # does this define a proc? Or a data structure
split-by (ifs=IFS) # is this possible? We register the proc itself?
config split-by (ifs=IFS) # register it
BEGIN {
var d = {}
}
END {
echo d.sum
}
when [$1 ~ /d+/] {
setvar d.sum += $1
}
}
table
to constructActions:
table cat
table align / table tabify
table header (cols)
table slice (1, -1) or (-1, -2) etc.
Subcommands
cols
types
attr units
Partial Parsing / Lazy Parsing - TSV8 is designed for this
# we only decode the columns that are necessary
cat myfile.tsv8 | table --by-col (&out, cols = :|bytes path|)
sort-tsv8
or join-tsv8
with novel algorithmsThis is sort of "expanding the scope" of the project, when we want to reduce scope.
But YSH has both tree-shaped JSON, and table-shaped TSV8, and jq is a nice bridge between them.
Streams of Trees (jq)
empty
this
this[]
=>
select()
a & b # more than one
Four types of Data Languages:
Four types of query languaegs:
Considering columns and then rows:
dplyr: