1 | trees: Sketch of Storage / Networking Architecture
|
2 | ==================================================
|
3 |
|
4 | As usual, we try not to invent anything big or new, but instead focus on
|
5 | composing and rationalizing existing software and protocols:
|
6 |
|
7 | - Many good implementation of POSIX file systems (Linux ext4, ZFS, etc.)
|
8 | - git, a distributed version control system
|
9 | - in particular the packfile format
|
10 | - the ssh send/receive pattern
|
11 | - Static WWW file servers like Apache and nginx
|
12 | - tar files, gzip files
|
13 |
|
14 | ## Use Cases
|
15 |
|
16 | 1. Building CI containers faster with wedges
|
17 | - native deps: re2c, bloaty, uftrace, ...
|
18 | - Python deps, e.g. MyPy
|
19 | - R deps, e.g. dplyr
|
20 | - wedge source is a .treeptr tarball
|
21 | - wedge derived is a .treeptr file
|
22 | 2. CI serving `.wwz` files. We need fast random access.
|
23 | 3. Running benchmarks on multiple machines
|
24 | - `oils-for-unix` tarball from EVERY commit, sync'd to different CI tasks
|
25 | 4. Comparisons across distros, OSes, and hardware
|
26 | - building same packages on Debian, Ubuntu, Alpine
|
27 | - and FreeBSD
|
28 | - x86 / x86-64 / ARM
|
29 | 5. Web .log files can be .treeptr files
|
30 |
|
31 | ## Silo: Large Trees Managed Outside Git
|
32 |
|
33 | You can `git pull` and `git push` without paying for these large objects, e.g.
|
34 | container images.
|
35 |
|
36 | To start, trees use regular compression with `gzip`. Later, it will introspect
|
37 | trees and take **hints** for **differential** compression.
|
38 |
|
39 | Related:
|
40 |
|
41 | - git annex
|
42 | - git LFS
|
43 |
|
44 | ### Data
|
45 |
|
46 | https://oilshell.org/
|
47 | deps.silo/
|
48 | objects/ # everything is a blob at first
|
49 | 00/ # checksums calculated with git hash-object
|
50 | 123456.gz # may be a .tar file, but silo doesn't know
|
51 | pack/ # like git, it can have deltas, and be repacked
|
52 | foo.pack
|
53 | foo.idx
|
54 | derived/ # DERIVED trees, e.g. different deltas,
|
55 | # different compression, SquashFS, ...
|
56 |
|
57 | ### Commands
|
58 |
|
59 | silo verify # blobs should have valid checksums
|
60 |
|
61 | Existing tools:
|
62 |
|
63 | rsync # back up the entire thing
|
64 | rclone # ditto, but works with cloud storage
|
65 |
|
66 | ssh rm "$@" # a list of vrefs to delete can be calculated by 'medo reachable'
|
67 | scp # create a new silo from 'medo reachable' manifest
|
68 |
|
69 | du --si -s # Total size of the Silo
|
70 |
|
71 | ## Medo (meadow): Named and Versioned Subtrees in `git`
|
72 |
|
73 | To start, this will untar and uncompress blobs from a Silo. We can also:
|
74 |
|
75 | - Materialize a git `tree`, e.g. in a packfile
|
76 | - Mount a git `tree` directly with FUSE. I think the pack `.idx` does binary
|
77 | search, which makes this possible.
|
78 | - TODO: write prototype with pygit2 wrapping libgit2
|
79 | - [FUSE bindings seem in question](https://stackoverflow.com/questions/52925566/which-module-is-the-actual-interface-to-fuse-from-python-3)
|
80 |
|
81 | ### Data
|
82 |
|
83 | ~/git/oilshell/oil/
|
84 | deps/ # 3 medo structure is arbitrary; they're
|
85 | # generally mounted in different places, and
|
86 | # used by different tools
|
87 |
|
88 | source.medo/ # Relocatable data
|
89 | SILO.json # Can point to multiple Silos
|
90 | Python-3.10.4.treeptr # with checksum and provenance (original URL)
|
91 |
|
92 | derived.medo/ # derived values, some are wedges with absolute paths
|
93 | SILO.json # Can point to multiple Silos
|
94 | debian/
|
95 | bullseye/
|
96 | Python-3.10.4.treeptr
|
97 | ubuntu/
|
98 | 20.04/
|
99 | Python-3.10.4.treeptr # derived data has provenance:
|
100 | # base layer, mounts of input / code, env / shell command
|
101 | 22.04/
|
102 | Python-3.10.4.treeptr
|
103 |
|
104 | opaque.medo/ # Opaque values that can use more provenance.
|
105 | SILO.json
|
106 | images/ # 'docker save' format. Make sure it can be imported.
|
107 | debian/
|
108 | bullseye/
|
109 | slim.treeptr
|
110 |
|
111 | layers/
|
112 | debian/
|
113 | bullseye/
|
114 | mypy-deps.treeptr # packages needed to build it
|
115 |
|
116 | ### Commands
|
117 |
|
118 | # Get files to build. This does uncompress/untar.
|
119 | medo expand deps/source.medo/Python-3.10.4.treeptr _tmp/source/
|
120 |
|
121 | # Or sync files that are already built. If they already exist, verify
|
122 | # checksums.
|
123 | medo expand deps/derived.medo/debian/bullseye/ /wedge/oilshell.org/deps
|
124 |
|
125 | # Combine SILO.json and the JSON in the .treeptr
|
126 | medo url-for deps/source.medo/Python-3.10.4.treeptr
|
127 |
|
128 | # Verify checksums.
|
129 | medo verify deps.medo/ /wedge/oilshell.org/deps
|
130 |
|
131 | # Makes a tarball and .treeptr that you can scp/rsync
|
132 | medo add /wedge/oilshell.org/bash-4.4/ deps.medo/ubuntu/18.04/bash-4.4.treeptr
|
133 |
|
134 | medo reachable deps.medo/ # first step of garbage collection
|
135 |
|
136 | medo mount # much later: FUSE mount
|
137 |
|
138 | ## `/wedge`: A binary-centric "semi-distro" that works with OCI containers, and without
|
139 |
|
140 | A package exports one or more binaries, and is a `treeptr` value:
|
141 |
|
142 | - metadata is stored in a `.medo` directory
|
143 | - data is stored in a Silo
|
144 |
|
145 | The package typically lives in a subdirectory of `/wedge`. This is due to to
|
146 | `configure --prefix=/wedge/...`.
|
147 |
|
148 | What can you do with it?
|
149 |
|
150 | - A wedge can be mounted, e.g. `--mount type=bind,...`
|
151 | - It can be copied into an image: `COPY ...`
|
152 | - for quick deployment to cloud services, like Github Actions or fly.io
|
153 | - It has provenance, like other treeptr values. The provenance is either:
|
154 | - the original URL, for source data
|
155 | - the code, data, and environment used to build it
|
156 |
|
157 | Related:
|
158 |
|
159 | - GNU Stow (symlinks)
|
160 | - GoboLinux
|
161 | - Distri (exchange dirs with FUSE)
|
162 | - Nix/Bazel: a wedge is a "purely functional" value
|
163 | - Docker: wedges are meant to be created in containers, and mounted in
|
164 | containers
|
165 |
|
166 | ### Data
|
167 |
|
168 | /wedge/ # an absolute path, for --configure --prefix=/wedge/..
|
169 | oils-for-unix.org/ # scoped to domain
|
170 | pkg/ # arbitrary structure, for dev dependencies
|
171 | Python-3.10.4.treeptr # metadata
|
172 | Python-3.10.4/
|
173 | python # Executable, which needs a 'python3' symlink
|
174 |
|
175 | ## Design Notes
|
176 |
|
177 | ### Data and Metadata Formats
|
178 |
|
179 | Text:
|
180 |
|
181 | - JSON for .treeptr, MEDO.json, SILO.json
|
182 | - lockfile / "world" / manifest - what does this look like?
|
183 |
|
184 | Data:
|
185 |
|
186 | - `git`
|
187 | - blob
|
188 | - tree for FS metadata
|
189 | - no commit objects!
|
190 | - packfile for multiple objects
|
191 | - Archiving: `.tar`,
|
192 | - OCI layers use `.tar`
|
193 | - Compression: `.gz`, `bzip2`, etc.
|
194 | - Encryption (well LUKS does the whole system)
|
195 |
|
196 | ### knot: Incremental, Parallel, Coarse-Grained, Containerized Builds with Ninja
|
197 |
|
198 | It's a wrapper like `ninja_lib.py`. Importantly, everything you build should
|
199 | be versioned, immutable, and cached, so it doesn't use timestamps!
|
200 |
|
201 | Distributed builds, too? Multiple workers can pull and publish intermediate
|
202 | values to the same Silo.
|
203 |
|
204 | Key ideas:
|
205 |
|
206 | - the knot worker pulls tasks and is pointed at source.medo and derived.medo
|
207 | directories.
|
208 | - All of this metadata is in git. The git repo is sync'd on worker
|
209 | initialization, and continually updated.
|
210 | - TODO: if 2 workers grab the same task, it should be OK. One of their git
|
211 | commits will fail?
|
212 | - The worker does a lazy 'medo sync'
|
213 | - The worker keeps a local cache of the Silo, according to the parts of the
|
214 | Medo it needs
|
215 | - It can give HINTS for differential compression, saying "I have
|
216 | Python-3.10.4, send me delta for Python-3.10.5"
|
217 | - If all metadata is local, it can be even smarter
|
218 |
|
219 | (Name: it's geometry like "wedge", and hopefully cuts a "Gordian knot.")
|
220 |
|
221 |
|
222 | ## TODO
|
223 |
|
224 | ### Research
|
225 |
|
226 | - shrub vs. blob?
|
227 | - a shrub is a subtree, unlike a git `tree` object which is like an inode
|
228 | - is all of the metadata like paths and sizes stored client side? Then the
|
229 | client can give repacking hints for differential compression, rather than
|
230 | the server doing anything smart.
|
231 | - medo explode? You change the reference client-side
|
232 | - or silo explode? It can redirect from blob to shrub
|
233 | - TODO: look at git tree format, and whether an entire subtree/shrub of
|
234 | metadata can be stored client-side. We want ONLY trees, and blobs should be
|
235 | DANGLING.
|
236 | - Use pack format, or maybe a text format.
|
237 |
|
238 | ```
|
239 | ~/git/oilshell/oil$ git cat-file -p master^{tree}
|
240 | 040000 tree 37689433372bc7f1db7109fe1749bff351cba5b0 .builds
|
241 | 040000 tree 5d6b8fdbeb144b771e10841b7286df42bfce4c52 .circleci
|
242 | 100644 blob 6385fd579efef14978900830e5fd74bbac907011 .cirrus.yml
|
243 | 100644 blob 343af37bf39d45b147bda8a85e8712b0292ddfea .clang-format
|
244 | 040000 tree 03400f57a8475d0cc696557833088d718adb2493 .github
|
245 | ```
|
246 |
|
247 | ### More
|
248 |
|
249 | - Analog for low level `runc`, `crun`
|
250 | - Analog for high level `docker run`, `podman run`
|
251 | - The equivalent of inotify() on a silo / medo.
|
252 | - could be an REST API on `https://app.oilshell.org/soil.medo/events/` for tarballs
|
253 | - it tells you what Silo to fetch from
|
254 | - Source browser for https://www.oilshell.org/deps.silo
|
255 |
|
256 | ## Ideas / Slogans
|
257 |
|
258 | - "Distributed OS without RPCs". We use the paradigms of state
|
259 | synchronization, dependency graphs (partial orders), and probably low-level
|
260 | "events".
|
261 | - Silo is the **data plane**; Medo is the **control plane**
|
262 | - Hay config files will also be a control plane
|
263 | - Silo is a **mechanism**; Medo is for **policy**
|
264 | - `/wedge` is a **middleground** between Docker and Nix/Bazel
|
265 | - Nix / Bazel are purely functional, but require rewriting upstream build
|
266 | systems in their own language (to fully make use of them)
|
267 | - Concretely: I don't want to rewrite the R build system for the tidyverse.
|
268 | I want to use the Debian packaging that already works, and that core R
|
269 | developers maintain.
|
270 | - `/wedge` is purely functional in the sense that wedges are literally
|
271 | **values**. But like Docker, you can use shell commands that mutate layers
|
272 | to create them. You can run entire language package managers and build
|
273 | systems via shell.
|
274 | - Wedges compose with, and compose better than, Docker layers.
|