Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.

Streams, Tables and Processes - awk, R, xargs

(July 2024)

This is a long, "unified/orthogonal" design for:

There's also a relation to:

It's a layered design. That means we need some underlying mechanisms:

It will link to:

Table of Contents
Background / References
Concrete Use Cases
Intro
How much code is it?
Thanks
Tools
Concepts
Underlying Mechanisms in Oils / Primitives
Process Pool or Event Loop Primitive?
Matrices - Orthogonal design in these dimensions
Concrete Decisions - Matrix cut off
String World
Awk Issues
Table World
table to construct
Will writing it in YSH be slow?
Applications
Extra: Tree World?
Pie in the Sky
Appendix
Notes on Naming

Background / References

Old wiki pages:

We're doing all of these.

Concrete Use Cases

Intro

How much code is it?

Thanks

Tools

Concepts

Underlying Mechanisms in Oils / Primitives

TODO:

Process Pool or Event Loop Primitive?

Matrices - Orthogonal design in these dimensions

This means we consider all these conversions

Concrete Decisions - Matrix cut off

Design might seem very general, but we did make some hard choices.

String World

THESE ARE ALL THE SAME ALGORITHM. They just have different names.

should we also have: if-split-by ? In case there aren't enough columns?

They all take:

for the block arg, this applies:

-j 4
--max-jobs 4

--max-jobs $(cached-nproc)
--max-jobs $[_nproc - 1]

Awk Issues

So we have this

echo begin
var d = {}
cat -- @files | split-by (ifs=IFS) {
  echo $2 $1
  call d->accum($1, $2)
}
echo end

But then how do we have conditionals:

Filter foo {  # does this define a proc?  Or a data structure

  split-by (ifs=IFS)  # is this possible?  We register the proc itself?

  config split-by (ifs=IFS)  # register it

  BEGIN {
    var d = {}
  }
  END {
    echo d.sum
  }

  when [$1 ~ /d+/] {
    setvar d.sum += $1
  }

}

Table World

table to construct

Actions:

table cat
table align / table tabify
table header (cols)
table slice (1, -1)   or (-1, -2) etc.

Subcommands

cols
types
attr units

Partial Parsing / Lazy Parsing - TSV8 is designed for this

# we only decode the columns that are necessary
cat myfile.tsv8 | table --by-col (&out, cols = :|bytes path|)

Will writing it in YSH be slow?

Applications

Extra: Tree World?

This is sort of "expanding the scope" of the project, when we want to reduce scope.

But YSH has both tree-shaped JSON, and table-shaped TSV8, and jq is a nice bridge between them.

Streams of Trees (jq)

empty
this
this[]
=>
select()
a & b  # more than one

Pie in the Sky

Four types of Data Languages:

Four types of query languaegs:

Appendix

Notes on Naming

Considering columns and then rows:

dplyr:

Generated on Thu, 11 Jul 2024 04:07:15 +0000