OILS / yaks / README.md View on Github | oilshell.org

320 lines, 222 significant
1Yaks
2====
3
4A minimal, TYPED language that compiles to the mycpp runtime.
5
6- It uses NIL8 for its syntax
7 - It's a Lisp-ish IR, but the semantics are imperative. (WebAssembly also
8 has a Lispy text format, but is imperative.)
9- It can be written by hand, and easily read by humans.
10 - `(func f [] (var [x Int] 42))` etc.
11 - See `oilshell/yaks` experiment in TypeScript.
12- It can also be generated in a LOSSLESS way from other languages, like Python
13 - So that we can generate precise type errors that point back to Python source code
14 - File, Line, Column
15 - We will probably split files into spans/Tokens like
16 `doctools/micro-syntax.md`. And then the tree can refer to this linear
17 list.
18
19## Goals
20
21Medium-term goals:
22
23- Get rid of dependence on (old version of) MyPy!
24 - this is causing build problems, e.g. on Fedora
25- Type check and run bin/{osh,ysh} in less than **one second**.
26 - Similar to my `deno check && deno run` experience (which was on a tiny
27 codebase; maybe it wasn't that fast.)
28
29Non-Goal: change the source language of Oils.
30
31- For the immediate future, it will still be in Python 2 with type comments.
32 - Contributors will be able to see a familiar language.
33 - We just got the yapf formatter working, etc.
34
35### Why Compile to C++?
36
37To be fast. Language processors should be written in AOT -compiled languages:
38
39- <https://news.ycombinator.com/item?id=35045520>
40
41`pea/TEST.sh` shows that
42
43- parsing all of Oils serially (wiht Python 3 ast.parse()) takes 410 ms.
44- Pickling also takes some time.
45- Unpickling takes time
46
47### More Issues with mycpp I want to address
48
49- speed of Python dev build
50 - type check and run Python in 1 second
51- speed of C++ build
52 - type check, generate multiple C++ MODULES, compile the whole program in
53 say 5-10 seconds. I think it's more like 30 seconds now.
54- rooting: can be optimize it, not generate 480 KB of StackRoots or wahtever
55 - this requires analyzing the call graph
56- MODULES and dependencies -- may require re-organizing the Oils codebase
57itself.
58 - cpp/preamble.h is a hack.
59 - This includes things like "system" deps like ENOENT and fork(), etc.
60
61## End Goal
62
63- `pea/pea_main.py` (written in Python 3!) exports Yaks for the ~40-50K lines
64 of Python 2 code in ~300-400 ms.
65- I think ASDL also needs to exports Yaks.
66 - so we can build a symbol table
67- `pea/pea.yaks` is compiled to C++. We use Ninja for that.
68 - How long will a non-incremental build take? I think this should be like
69 - 500 lines for the reader/front end,
70 - 1000-3000 lines for the type checker?
71 - 1000-3000 lines for the C++ code generator?
72 - Plus 4000 lines of the mycpp runtime
73 - So I hope this will take just 1 second to compile and LINK. Unit tests
74 against mycpp seem sto take about 1 second, so it could be comparable.
75
76Then
77
78- `pea/pea_main.py | yaks check` type checks Oils in 1 second, and then you run
79 `bin/osh`.
80- `pea/pea_main.py | yaks cpp` generates C++ in one second
81 - then it should take 5-10 seconds to compile the output in Ninja.
82- Then you run `_bin/cxx-dbg/osh`
83
84## Implementation Strategy - It Requires Bootstrapping
85
86Let's minimize the stuff we need to write twice.
87
88- Write the first pass "yaks0" in Python 2 with re2c and ASDL.
89 - uses the NIL8 parser
90 - custom transformer from nil8.asdl -> yaks.asdl
91 - then print C++ directly with no type checking
92
93- Then I think you can write "myyaks" ?
94 - It's a single mycpp pass that outputs Yaks, with all the type information?
95
96OLDER
97
98- then rewrite all of that in Yaks itself???
99 - lexer - does it invoke re2c directly? No Python
100 - parser - with infix rule
101 - transform
102 - print
103
104- then translate Python 2 to Yaks, and type check that
105 - first use MyPy? does that make sense? I think so.
106 - you preserve all the hacks
107 - REDUCE MYCPP TO A SINGLE PASS TO YAKS
108 - all passes go over yaks, not over MyPy
109 - const pass, forward decl pass, etc.
110
111- then run it on itself!
112- then run it on Oils
113
114- then add your own type checker
115 - go from `pea/pea_main.py` with the Python 2 AST to Yaks directly, without MyPy
116
117So you could have:
118
119- `yaks/{yaks_main.py,yaks.asdl}` -> generates mycpp C++ from `test.yaks`
120 - copy mycpp/examples
121 - {lex,read,transform,cppgen}.py
122- `yaks/yaks_main.yaks` -> generates mycpp C++ from `oils-for-unix.yaks`, which
123 is the CST dump of Python 2 !!!
124 - {lex,read,transform,check,cppgen}.yaks -- add type checker here
125- `pea/py2_to_yaks.py`?
126 - I guess this dumps yaks directly? Or is there another step?
127 - So we can use that type checker. We're only writing a single type checker?
128 - Yaks will have to be expanded with `class with` and all that.
129
130OK that's a little weird. Why not just stick with all Python? Because:
131
132- `pea.yaks` has to be FAST!
133- Python has some weaknesses, like you need this extra ASDL
134
135Although I suppose you could consider typed Python 3 or something?
136
137That has proper declarations. But you would still need ASDL.
138
139## Notes
140
141### NIL8 Features
142
143- Might want multiline strings that are indented
144 - for printing blocks of C++
145
146### Line Wrapping
147
148How do we make the output wrapped to 80 columns? We need some kind of limit
149
150- ASDL has an ad hoc function I borrowed from CPython
151 - it doesn't work all the time
152
153## First Words
154
155Operators for names:
156
157 (obj.field)
158 (obj..method) # not -> because it conflicts with C syntax?
159 # We might want values and methods
160 (namespace::func)
161
162Arithmetic:
163
164 (+ 1 2) et.c
165
166 # + - / * %
167 # ^ & |
168 # unary ~
169
170 a,42 is a[42]
171
172 (== 3 4)
173 (< 3 4)
174 (<= 3 4)
175 (> 3 4)
176 (>= 3 4)
177
178Boolean:
179
180 (not true)
181 (and true true)
182 (or false false)
183
184Only 1 top level keyword:
185
186 # Everything lives in a module.
187 (module osh
188 ...
189 )
190 (module ysh
191 ...
192 )
193
194
195How do we do import and export? Do we have a require/provide kind of things?
196
197Does this replace Ninja? Info is duplicated in C++ and Ninja.
198
199Module-level keywords:
200
201 (global [s Str] "123")
202
203 (func main [x Int] ...)
204
205 (class Foo
206 (construct Foo [x Int] ...)
207 (method main [x Str] ...)
208
209 (member x Int)
210 # Or you could do this I suppose
211
212 [x Int]
213 [y Int]
214 )
215
216 # Do we need something special for
217 # There could be a macro that desugars this
218
219 (class ctx_Foo
220 (construct ctx_Foo)
221 (method __enter__ # no-op
222 ...
223 )
224 (method __exit__
225 )
226
227Within functions:
228
229 (var [x Str] "123")
230
231 # function call
232 (print "hi")
233
234 # or maybe
235 (call print "hi")
236
237 (if true (print "true") (print "false"))
238
239 (while true
240 (print "hi")
241 )
242
243 (foreach # not like for in C, that could be separate
244
245 (switch x
246 (case [1 2 3]
247 (call print "num")
248 (call print "hi")
249 )
250 (case [4 5 6]
251 (call print "num")
252 (call print "hi")
253 )
254 )
255
256 (break)
257 (continue)
258 (return 42)
259
260Special within functions:
261
262 (tagswitch (call obj..tag) # method call I guess
263 (case command_e.Simple
264 (call print "simple")
265 )
266 (case command_e::Pipeline # maybe use ::
267 (call print "pipe")
268 )
269 )
270
271 (with (call ctx_Foo a b)
272 (call print "hi")
273 )
274
275Not used in mycpp, but could be in other C++ code generators:
276
277 (& i) # address of
278 (* pi) # pointer dereference
279
280
281### `PY_foo` for mycpp ambiguities
282
283I think there can be a separate pass that resolves this?
284
285Export
286
287 t(x)
288 (PY_call t x)
289
290Then this gets turned into EITHER:
291
292 (call t x)
293 (call [Alloc t] x)
294
295I think it depends on the symbol table.
296
297Likewise if you have
298
299 x.y()
300
301That has to be
302
303 (call (PY_attr x y))
304
305Turn it into either
306
307 (call x->y) # obj->member
308 (call x::y) # namespace::func
309
310
311More
312
313- `PY_cast` -> `reinterpret_cast` or `static_cast`
314
315
316### Other Issues
317
318- C++ nested scope vs. flat function scope
319
320