1 | Yaks
|
2 | ====
|
3 |
|
4 | A minimal, TYPED language that compiles to the mycpp runtime.
|
5 |
|
6 | - It uses NIL8 for its syntax
|
7 | - It's a Lisp-ish IR, but the semantics are imperative. (WebAssembly also
|
8 | has a Lispy text format, but is imperative.)
|
9 | - It can be written by hand, and easily read by humans.
|
10 | - `(func f [] (var [x Int] 42))` etc.
|
11 | - See `oilshell/yaks` experiment in TypeScript.
|
12 | - It can also be generated in a LOSSLESS way from other languages, like Python
|
13 | - So that we can generate precise type errors that point back to Python source code
|
14 | - File, Line, Column
|
15 | - We will probably split files into spans/Tokens like
|
16 | `doctools/micro-syntax.md`. And then the tree can refer to this linear
|
17 | list.
|
18 |
|
19 | ## Goals
|
20 |
|
21 | Medium-term goals:
|
22 |
|
23 | - Get rid of dependence on (old version of) MyPy!
|
24 | - this is causing build problems, e.g. on Fedora
|
25 | - Type check and run bin/{osh,ysh} in less than **one second**.
|
26 | - Similar to my `deno check && deno run` experience (which was on a tiny
|
27 | codebase; maybe it wasn't that fast.)
|
28 |
|
29 | Non-Goal: change the source language of Oils.
|
30 |
|
31 | - For the immediate future, it will still be in Python 2 with type comments.
|
32 | - Contributors will be able to see a familiar language.
|
33 | - We just got the yapf formatter working, etc.
|
34 |
|
35 | ### Why Compile to C++?
|
36 |
|
37 | To be fast. Language processors should be written in AOT -compiled languages:
|
38 |
|
39 | - <https://news.ycombinator.com/item?id=35045520>
|
40 |
|
41 | `pea/TEST.sh` shows that
|
42 |
|
43 | - parsing all of Oils serially (wiht Python 3 ast.parse()) takes 410 ms.
|
44 | - Pickling also takes some time.
|
45 | - Unpickling takes time
|
46 |
|
47 | ### More Issues with mycpp I want to address
|
48 |
|
49 | - speed of Python dev build
|
50 | - type check and run Python in 1 second
|
51 | - speed of C++ build
|
52 | - type check, generate multiple C++ MODULES, compile the whole program in
|
53 | say 5-10 seconds. I think it's more like 30 seconds now.
|
54 | - rooting: can be optimize it, not generate 480 KB of StackRoots or wahtever
|
55 | - this requires analyzing the call graph
|
56 | - MODULES and dependencies -- may require re-organizing the Oils codebase
|
57 | itself.
|
58 | - cpp/preamble.h is a hack.
|
59 | - This includes things like "system" deps like ENOENT and fork(), etc.
|
60 |
|
61 | ## End Goal
|
62 |
|
63 | - `pea/pea_main.py` (written in Python 3!) exports Yaks for the ~40-50K lines
|
64 | of Python 2 code in ~300-400 ms.
|
65 | - I think ASDL also needs to exports Yaks.
|
66 | - so we can build a symbol table
|
67 | - `pea/pea.yaks` is compiled to C++. We use Ninja for that.
|
68 | - How long will a non-incremental build take? I think this should be like
|
69 | - 500 lines for the reader/front end,
|
70 | - 1000-3000 lines for the type checker?
|
71 | - 1000-3000 lines for the C++ code generator?
|
72 | - Plus 4000 lines of the mycpp runtime
|
73 | - So I hope this will take just 1 second to compile and LINK. Unit tests
|
74 | against mycpp seem sto take about 1 second, so it could be comparable.
|
75 |
|
76 | Then
|
77 |
|
78 | - `pea/pea_main.py | yaks check` type checks Oils in 1 second, and then you run
|
79 | `bin/osh`.
|
80 | - `pea/pea_main.py | yaks cpp` generates C++ in one second
|
81 | - then it should take 5-10 seconds to compile the output in Ninja.
|
82 | - Then you run `_bin/cxx-dbg/osh`
|
83 |
|
84 | ## Implementation Strategy - It Requires Bootstrapping
|
85 |
|
86 | Let's minimize the stuff we need to write twice.
|
87 |
|
88 | - Write the first pass "yaks0" in Python 2 with re2c and ASDL.
|
89 | - uses the NIL8 parser
|
90 | - custom transformer from nil8.asdl -> yaks.asdl
|
91 | - then print C++ directly with no type checking
|
92 |
|
93 | - Then I think you can write "myyaks" ?
|
94 | - It's a single mycpp pass that outputs Yaks, with all the type information?
|
95 |
|
96 | OLDER
|
97 |
|
98 | - then rewrite all of that in Yaks itself???
|
99 | - lexer - does it invoke re2c directly? No Python
|
100 | - parser - with infix rule
|
101 | - transform
|
102 | - print
|
103 |
|
104 | - then translate Python 2 to Yaks, and type check that
|
105 | - first use MyPy? does that make sense? I think so.
|
106 | - you preserve all the hacks
|
107 | - REDUCE MYCPP TO A SINGLE PASS TO YAKS
|
108 | - all passes go over yaks, not over MyPy
|
109 | - const pass, forward decl pass, etc.
|
110 |
|
111 | - then run it on itself!
|
112 | - then run it on Oils
|
113 |
|
114 | - then add your own type checker
|
115 | - go from `pea/pea_main.py` with the Python 2 AST to Yaks directly, without MyPy
|
116 |
|
117 | So you could have:
|
118 |
|
119 | - `yaks/{yaks_main.py,yaks.asdl}` -> generates mycpp C++ from `test.yaks`
|
120 | - copy mycpp/examples
|
121 | - {lex,read,transform,cppgen}.py
|
122 | - `yaks/yaks_main.yaks` -> generates mycpp C++ from `oils-for-unix.yaks`, which
|
123 | is the CST dump of Python 2 !!!
|
124 | - {lex,read,transform,check,cppgen}.yaks -- add type checker here
|
125 | - `pea/py2_to_yaks.py`?
|
126 | - I guess this dumps yaks directly? Or is there another step?
|
127 | - So we can use that type checker. We're only writing a single type checker?
|
128 | - Yaks will have to be expanded with `class with` and all that.
|
129 |
|
130 | OK that's a little weird. Why not just stick with all Python? Because:
|
131 |
|
132 | - `pea.yaks` has to be FAST!
|
133 | - Python has some weaknesses, like you need this extra ASDL
|
134 |
|
135 | Although I suppose you could consider typed Python 3 or something?
|
136 |
|
137 | That has proper declarations. But you would still need ASDL.
|
138 |
|
139 | ## Notes
|
140 |
|
141 | ### NIL8 Features
|
142 |
|
143 | - Might want multiline strings that are indented
|
144 | - for printing blocks of C++
|
145 |
|
146 | ### Line Wrapping
|
147 |
|
148 | How do we make the output wrapped to 80 columns? We need some kind of limit
|
149 |
|
150 | - ASDL has an ad hoc function I borrowed from CPython
|
151 | - it doesn't work all the time
|
152 |
|
153 | ## First Words
|
154 |
|
155 | Operators for names:
|
156 |
|
157 | (obj.field)
|
158 | (obj..method) # not -> because it conflicts with C syntax?
|
159 | # We might want values and methods
|
160 | (namespace::func)
|
161 |
|
162 | Arithmetic:
|
163 |
|
164 | (+ 1 2) et.c
|
165 |
|
166 | # + - / * %
|
167 | # ^ & |
|
168 | # unary ~
|
169 |
|
170 | a,42 is a[42]
|
171 |
|
172 | (== 3 4)
|
173 | (< 3 4)
|
174 | (<= 3 4)
|
175 | (> 3 4)
|
176 | (>= 3 4)
|
177 |
|
178 | Boolean:
|
179 |
|
180 | (not true)
|
181 | (and true true)
|
182 | (or false false)
|
183 |
|
184 | Only 1 top level keyword:
|
185 |
|
186 | # Everything lives in a module.
|
187 | (module osh
|
188 | ...
|
189 | )
|
190 | (module ysh
|
191 | ...
|
192 | )
|
193 |
|
194 |
|
195 | How do we do import and export? Do we have a require/provide kind of things?
|
196 |
|
197 | Does this replace Ninja? Info is duplicated in C++ and Ninja.
|
198 |
|
199 | Module-level keywords:
|
200 |
|
201 | (global [s Str] "123")
|
202 |
|
203 | (func main [x Int] ...)
|
204 |
|
205 | (class Foo
|
206 | (construct Foo [x Int] ...)
|
207 | (method main [x Str] ...)
|
208 |
|
209 | (member x Int)
|
210 | # Or you could do this I suppose
|
211 |
|
212 | [x Int]
|
213 | [y Int]
|
214 | )
|
215 |
|
216 | # Do we need something special for
|
217 | # There could be a macro that desugars this
|
218 |
|
219 | (class ctx_Foo
|
220 | (construct ctx_Foo)
|
221 | (method __enter__ # no-op
|
222 | ...
|
223 | )
|
224 | (method __exit__
|
225 | )
|
226 |
|
227 | Within functions:
|
228 |
|
229 | (var [x Str] "123")
|
230 |
|
231 | # function call
|
232 | (print "hi")
|
233 |
|
234 | # or maybe
|
235 | (call print "hi")
|
236 |
|
237 | (if true (print "true") (print "false"))
|
238 |
|
239 | (while true
|
240 | (print "hi")
|
241 | )
|
242 |
|
243 | (foreach # not like for in C, that could be separate
|
244 |
|
245 | (switch x
|
246 | (case [1 2 3]
|
247 | (call print "num")
|
248 | (call print "hi")
|
249 | )
|
250 | (case [4 5 6]
|
251 | (call print "num")
|
252 | (call print "hi")
|
253 | )
|
254 | )
|
255 |
|
256 | (break)
|
257 | (continue)
|
258 | (return 42)
|
259 |
|
260 | Special within functions:
|
261 |
|
262 | (tagswitch (call obj..tag) # method call I guess
|
263 | (case command_e.Simple
|
264 | (call print "simple")
|
265 | )
|
266 | (case command_e::Pipeline # maybe use ::
|
267 | (call print "pipe")
|
268 | )
|
269 | )
|
270 |
|
271 | (with (call ctx_Foo a b)
|
272 | (call print "hi")
|
273 | )
|
274 |
|
275 | Not used in mycpp, but could be in other C++ code generators:
|
276 |
|
277 | (& i) # address of
|
278 | (* pi) # pointer dereference
|
279 |
|
280 |
|
281 | ### `PY_foo` for mycpp ambiguities
|
282 |
|
283 | I think there can be a separate pass that resolves this?
|
284 |
|
285 | Export
|
286 |
|
287 | t(x)
|
288 | (PY_call t x)
|
289 |
|
290 | Then this gets turned into EITHER:
|
291 |
|
292 | (call t x)
|
293 | (call [Alloc t] x)
|
294 |
|
295 | I think it depends on the symbol table.
|
296 |
|
297 | Likewise if you have
|
298 |
|
299 | x.y()
|
300 |
|
301 | That has to be
|
302 |
|
303 | (call (PY_attr x y))
|
304 |
|
305 | Turn it into either
|
306 |
|
307 | (call x->y) # obj->member
|
308 | (call x::y) # namespace::func
|
309 |
|
310 |
|
311 | More
|
312 |
|
313 | - `PY_cast` -> `reinterpret_cast` or `static_cast`
|
314 |
|
315 |
|
316 | ### Other Issues
|
317 |
|
318 | - C++ nested scope vs. flat function scope
|
319 |
|
320 |
|