| 1 | Yaks
|
| 2 | ====
|
| 3 |
|
| 4 | A minimal, TYPED language that compiles to the mycpp runtime.
|
| 5 |
|
| 6 | - It uses NIL8 for its syntax
|
| 7 | - It's a Lisp-ish IR, but the semantics are imperative. (WebAssembly also
|
| 8 | has a Lispy text format, but is imperative.)
|
| 9 | - It can be written by hand, and easily read by humans.
|
| 10 | - `(func f [] (var [x Int] 42))` etc.
|
| 11 | - See `oilshell/yaks` experiment in TypeScript.
|
| 12 | - It can also be generated in a LOSSLESS way from other languages, like Python
|
| 13 | - So that we can generate precise type errors that point back to Python source code
|
| 14 | - File, Line, Column
|
| 15 | - We will probably split files into spans/Tokens like
|
| 16 | `doctools/micro-syntax.md`. And then the tree can refer to this linear
|
| 17 | list.
|
| 18 |
|
| 19 | ## Goals
|
| 20 |
|
| 21 | Medium-term goals:
|
| 22 |
|
| 23 | - Get rid of dependence on (old version of) MyPy!
|
| 24 | - this is causing build problems, e.g. on Fedora
|
| 25 | - Type check and run bin/{osh,ysh} in less than **one second**.
|
| 26 | - Similar to my `deno check && deno run` experience (which was on a tiny
|
| 27 | codebase; maybe it wasn't that fast.)
|
| 28 |
|
| 29 | Non-Goal: change the source language of Oils.
|
| 30 |
|
| 31 | - For the immediate future, it will still be in Python 2 with type comments.
|
| 32 | - Contributors will be able to see a familiar language.
|
| 33 | - We just got the yapf formatter working, etc.
|
| 34 |
|
| 35 | ### Why Compile to C++?
|
| 36 |
|
| 37 | To be fast. Language processors should be written in AOT -compiled languages:
|
| 38 |
|
| 39 | - <https://news.ycombinator.com/item?id=35045520>
|
| 40 |
|
| 41 | `pea/TEST.sh` shows that
|
| 42 |
|
| 43 | - parsing all of Oils serially (wiht Python 3 ast.parse()) takes 410 ms.
|
| 44 | - Pickling also takes some time.
|
| 45 | - Unpickling takes time
|
| 46 |
|
| 47 | ### More Issues with mycpp I want to address
|
| 48 |
|
| 49 | - speed of Python dev build
|
| 50 | - type check and run Python in 1 second
|
| 51 | - speed of C++ build
|
| 52 | - type check, generate multiple C++ MODULES, compile the whole program in
|
| 53 | say 5-10 seconds. I think it's more like 30 seconds now.
|
| 54 | - rooting: can be optimize it, not generate 480 KB of StackRoots or wahtever
|
| 55 | - this requires analyzing the call graph
|
| 56 | - MODULES and dependencies -- may require re-organizing the Oils codebase
|
| 57 | itself.
|
| 58 | - cpp/preamble.h is a hack.
|
| 59 | - This includes things like "system" deps like ENOENT and fork(), etc.
|
| 60 |
|
| 61 | ## End Goal
|
| 62 |
|
| 63 | - `pea/pea_main.py` (written in Python 3!) exports Yaks for the ~40-50K lines
|
| 64 | of Python 2 code in ~300-400 ms.
|
| 65 | - I think ASDL also needs to exports Yaks.
|
| 66 | - so we can build a symbol table
|
| 67 | - `pea/pea.yaks` is compiled to C++. We use Ninja for that.
|
| 68 | - How long will a non-incremental build take? I think this should be like
|
| 69 | - 500 lines for the reader/front end,
|
| 70 | - 1000-3000 lines for the type checker?
|
| 71 | - 1000-3000 lines for the C++ code generator?
|
| 72 | - Plus 4000 lines of the mycpp runtime
|
| 73 | - So I hope this will take just 1 second to compile and LINK. Unit tests
|
| 74 | against mycpp seem sto take about 1 second, so it could be comparable.
|
| 75 |
|
| 76 | Then
|
| 77 |
|
| 78 | - `pea/pea_main.py | yaks check` type checks Oils in 1 second, and then you run
|
| 79 | `bin/osh`.
|
| 80 | - `pea/pea_main.py | yaks cpp` generates C++ in one second
|
| 81 | - then it should take 5-10 seconds to compile the output in Ninja.
|
| 82 | - Then you run `_bin/cxx-dbg/osh`
|
| 83 |
|
| 84 | ## Implementation Strategy - It Requires Bootstrapping
|
| 85 |
|
| 86 | Let's minimize the stuff we need to write twice.
|
| 87 |
|
| 88 | - Write the first pass "yaks0" in Python 2 with re2c and ASDL.
|
| 89 | - uses the NIL8 parser
|
| 90 | - custom transformer from nil8.asdl -> yaks.asdl
|
| 91 | - then print C++ directly with no type checking
|
| 92 |
|
| 93 | - Then I think you can write "myyaks" ?
|
| 94 | - It's a single mycpp pass that outputs Yaks, with all the type information?
|
| 95 |
|
| 96 | OLDER
|
| 97 |
|
| 98 | - then rewrite all of that in Yaks itself???
|
| 99 | - lexer - does it invoke re2c directly? No Python
|
| 100 | - parser - with infix rule
|
| 101 | - transform
|
| 102 | - print
|
| 103 |
|
| 104 | - then translate Python 2 to Yaks, and type check that
|
| 105 | - first use MyPy? does that make sense? I think so.
|
| 106 | - you preserve all the hacks
|
| 107 | - REDUCE MYCPP TO A SINGLE PASS TO YAKS
|
| 108 | - all passes go over yaks, not over MyPy
|
| 109 | - const pass, forward decl pass, etc.
|
| 110 |
|
| 111 | - then run it on itself!
|
| 112 | - then run it on Oils
|
| 113 |
|
| 114 | - then add your own type checker
|
| 115 | - go from `pea/pea_main.py` with the Python 2 AST to Yaks directly, without MyPy
|
| 116 |
|
| 117 | So you could have:
|
| 118 |
|
| 119 | - `yaks/{yaks_main.py,yaks.asdl}` -> generates mycpp C++ from `test.yaks`
|
| 120 | - copy mycpp/examples
|
| 121 | - {lex,read,transform,cppgen}.py
|
| 122 | - `yaks/yaks_main.yaks` -> generates mycpp C++ from `oils-for-unix.yaks`, which
|
| 123 | is the CST dump of Python 2 !!!
|
| 124 | - {lex,read,transform,check,cppgen}.yaks -- add type checker here
|
| 125 | - `pea/py2_to_yaks.py`?
|
| 126 | - I guess this dumps yaks directly? Or is there another step?
|
| 127 | - So we can use that type checker. We're only writing a single type checker?
|
| 128 | - Yaks will have to be expanded with `class with` and all that.
|
| 129 |
|
| 130 | OK that's a little weird. Why not just stick with all Python? Because:
|
| 131 |
|
| 132 | - `pea.yaks` has to be FAST!
|
| 133 | - Python has some weaknesses, like you need this extra ASDL
|
| 134 |
|
| 135 | Although I suppose you could consider typed Python 3 or something?
|
| 136 |
|
| 137 | That has proper declarations. But you would still need ASDL.
|
| 138 |
|
| 139 | ## Notes
|
| 140 |
|
| 141 | ### NIL8 Features
|
| 142 |
|
| 143 | - Might want multiline strings that are indented
|
| 144 | - for printing blocks of C++
|
| 145 |
|
| 146 | ### Line Wrapping
|
| 147 |
|
| 148 | How do we make the output wrapped to 80 columns? We need some kind of limit
|
| 149 |
|
| 150 | - ASDL has an ad hoc function I borrowed from CPython
|
| 151 | - it doesn't work all the time
|
| 152 |
|
| 153 | ## First Words
|
| 154 |
|
| 155 | Operators for names:
|
| 156 |
|
| 157 | (obj.field)
|
| 158 | (obj..method) # not -> because it conflicts with C syntax?
|
| 159 | # We might want values and methods
|
| 160 | (namespace::func)
|
| 161 |
|
| 162 | Arithmetic:
|
| 163 |
|
| 164 | (+ 1 2) et.c
|
| 165 |
|
| 166 | # + - / * %
|
| 167 | # ^ & |
|
| 168 | # unary ~
|
| 169 |
|
| 170 | a,42 is a[42]
|
| 171 |
|
| 172 | (== 3 4)
|
| 173 | (< 3 4)
|
| 174 | (<= 3 4)
|
| 175 | (> 3 4)
|
| 176 | (>= 3 4)
|
| 177 |
|
| 178 | Boolean:
|
| 179 |
|
| 180 | (not true)
|
| 181 | (and true true)
|
| 182 | (or false false)
|
| 183 |
|
| 184 | Only 1 top level keyword:
|
| 185 |
|
| 186 | # Everything lives in a module.
|
| 187 | (module osh
|
| 188 | ...
|
| 189 | )
|
| 190 | (module ysh
|
| 191 | ...
|
| 192 | )
|
| 193 |
|
| 194 |
|
| 195 | How do we do import and export? Do we have a require/provide kind of things?
|
| 196 |
|
| 197 | Does this replace Ninja? Info is duplicated in C++ and Ninja.
|
| 198 |
|
| 199 | Module-level keywords:
|
| 200 |
|
| 201 | (global [s Str] "123")
|
| 202 |
|
| 203 | (func main [x Int] ...)
|
| 204 |
|
| 205 | (class Foo
|
| 206 | (construct Foo [x Int] ...)
|
| 207 | (method main [x Str] ...)
|
| 208 |
|
| 209 | (member x Int)
|
| 210 | # Or you could do this I suppose
|
| 211 |
|
| 212 | [x Int]
|
| 213 | [y Int]
|
| 214 | )
|
| 215 |
|
| 216 | # Do we need something special for
|
| 217 | # There could be a macro that desugars this
|
| 218 |
|
| 219 | (class ctx_Foo
|
| 220 | (construct ctx_Foo)
|
| 221 | (method __enter__ # no-op
|
| 222 | ...
|
| 223 | )
|
| 224 | (method __exit__
|
| 225 | )
|
| 226 |
|
| 227 | Within functions:
|
| 228 |
|
| 229 | (var [x Str] "123")
|
| 230 |
|
| 231 | # function call
|
| 232 | (print "hi")
|
| 233 |
|
| 234 | # or maybe
|
| 235 | (call print "hi")
|
| 236 |
|
| 237 | (if true (print "true") (print "false"))
|
| 238 |
|
| 239 | (while true
|
| 240 | (print "hi")
|
| 241 | )
|
| 242 |
|
| 243 | (foreach # not like for in C, that could be separate
|
| 244 |
|
| 245 | (switch x
|
| 246 | (case [1 2 3]
|
| 247 | (call print "num")
|
| 248 | (call print "hi")
|
| 249 | )
|
| 250 | (case [4 5 6]
|
| 251 | (call print "num")
|
| 252 | (call print "hi")
|
| 253 | )
|
| 254 | )
|
| 255 |
|
| 256 | (break)
|
| 257 | (continue)
|
| 258 | (return 42)
|
| 259 |
|
| 260 | Special within functions:
|
| 261 |
|
| 262 | (tagswitch (call obj..tag) # method call I guess
|
| 263 | (case command_e.Simple
|
| 264 | (call print "simple")
|
| 265 | )
|
| 266 | (case command_e::Pipeline # maybe use ::
|
| 267 | (call print "pipe")
|
| 268 | )
|
| 269 | )
|
| 270 |
|
| 271 | (with (call ctx_Foo a b)
|
| 272 | (call print "hi")
|
| 273 | )
|
| 274 |
|
| 275 | Not used in mycpp, but could be in other C++ code generators:
|
| 276 |
|
| 277 | (& i) # address of
|
| 278 | (* pi) # pointer dereference
|
| 279 |
|
| 280 |
|
| 281 | ### `PY_foo` for mycpp ambiguities
|
| 282 |
|
| 283 | I think there can be a separate pass that resolves this?
|
| 284 |
|
| 285 | Export
|
| 286 |
|
| 287 | t(x)
|
| 288 | (PY_call t x)
|
| 289 |
|
| 290 | Then this gets turned into EITHER:
|
| 291 |
|
| 292 | (call t x)
|
| 293 | (call [Alloc t] x)
|
| 294 |
|
| 295 | I think it depends on the symbol table.
|
| 296 |
|
| 297 | Likewise if you have
|
| 298 |
|
| 299 | x.y()
|
| 300 |
|
| 301 | That has to be
|
| 302 |
|
| 303 | (call (PY_attr x y))
|
| 304 |
|
| 305 | Turn it into either
|
| 306 |
|
| 307 | (call x->y) # obj->member
|
| 308 | (call x::y) # namespace::func
|
| 309 |
|
| 310 |
|
| 311 | More
|
| 312 |
|
| 313 | - `PY_cast` -> `reinterpret_cast` or `static_cast`
|
| 314 |
|
| 315 |
|
| 316 | ### Other Issues
|
| 317 |
|
| 318 | - C++ nested scope vs. flat function scope
|
| 319 |
|
| 320 |
|