| 1 | Yaks
 | 
| 2 | ====
 | 
| 3 | 
 | 
| 4 | A minimal, TYPED language that compiles to the mycpp runtime.
 | 
| 5 | 
 | 
| 6 | - It uses NIL8 for its syntax
 | 
| 7 |    - It's a Lisp-ish IR, but the semantics are imperative.  (WebAssembly also
 | 
| 8 |      has a Lispy text format, but is imperative.)
 | 
| 9 | - It can be written by hand, and easily read by humans.
 | 
| 10 |   - `(func f [] (var [x Int] 42))` etc.
 | 
| 11 |   - See `oilshell/yaks` experiment in TypeScript.
 | 
| 12 | - It can also be generated in a LOSSLESS way from other languages, like Python
 | 
| 13 |   - So that we can generate precise type errors that point back to Python source code
 | 
| 14 |   - File, Line, Column 
 | 
| 15 |   - We will probably split files into spans/Tokens like
 | 
| 16 |     `doctools/micro-syntax.md`.  And then the tree can refer to this linear
 | 
| 17 |     list.
 | 
| 18 | 
 | 
| 19 | ## Goals
 | 
| 20 | 
 | 
| 21 | Medium-term goals:
 | 
| 22 | 
 | 
| 23 | - Get rid of dependence on (old version of) MyPy!
 | 
| 24 |   - this is causing build problems, e.g. on Fedora
 | 
| 25 | - Type check and run bin/{osh,ysh} in less than **one second**.
 | 
| 26 |   - Similar to my `deno check && deno run` experience (which was on a tiny
 | 
| 27 |     codebase; maybe it wasn't that fast.)
 | 
| 28 | 
 | 
| 29 | Non-Goal: change the source language of Oils.
 | 
| 30 | 
 | 
| 31 | - For the immediate future, it will still be in Python 2 with type comments.
 | 
| 32 |   - Contributors will be able to see a familiar language.
 | 
| 33 |   - We just got the yapf formatter working, etc.
 | 
| 34 | 
 | 
| 35 | ### Why Compile to C++?
 | 
| 36 | 
 | 
| 37 | To be fast.  Language processors should be written in AOT -compiled languages:
 | 
| 38 | 
 | 
| 39 | - <https://news.ycombinator.com/item?id=35045520>
 | 
| 40 | 
 | 
| 41 | `pea/TEST.sh` shows that 
 | 
| 42 | 
 | 
| 43 | - parsing all of Oils serially (wiht Python 3 ast.parse()) takes 410 ms.
 | 
| 44 | - Pickling also takes some time.
 | 
| 45 | - Unpickling takes time
 | 
| 46 | 
 | 
| 47 | ### More Issues with mycpp I want to address
 | 
| 48 | 
 | 
| 49 | - speed of Python dev build
 | 
| 50 |   - type check and run Python in 1 second
 | 
| 51 | - speed of C++ build
 | 
| 52 |   - type check, generate multiple C++ MODULES, compile the whole program in
 | 
| 53 |     say 5-10 seconds.  I think it's more like 30 seconds now.
 | 
| 54 | - rooting: can be optimize it, not generate 480 KB of StackRoots or wahtever
 | 
| 55 |   - this requires analyzing the call graph
 | 
| 56 | - MODULES and dependencies -- may require re-organizing the Oils codebase
 | 
| 57 | itself.
 | 
| 58 |   - cpp/preamble.h is a hack.
 | 
| 59 |   - This includes things like "system" deps like ENOENT and fork(), etc.
 | 
| 60 | 
 | 
| 61 | ## End Goal
 | 
| 62 | 
 | 
| 63 | - `pea/pea_main.py` (written in Python 3!) exports Yaks for the ~40-50K lines
 | 
| 64 |   of Python 2 code in ~300-400 ms.
 | 
| 65 | - I think ASDL also needs to exports Yaks.
 | 
| 66 |   - so we can build a symbol table
 | 
| 67 | - `pea/pea.yaks` is compiled to C++.  We use Ninja for that.
 | 
| 68 |   - How long will a non-incremental build take?  I think this should be like
 | 
| 69 |     - 500 lines for the reader/front end, 
 | 
| 70 |     - 1000-3000 lines for the type checker?
 | 
| 71 |     - 1000-3000 lines for the C++ code generator?
 | 
| 72 |   - Plus 4000 lines of the mycpp runtime
 | 
| 73 |   - So I hope this will take just 1 second to compile and LINK.  Unit tests
 | 
| 74 |     against mycpp seem sto take about 1 second, so it could be comparable.
 | 
| 75 | 
 | 
| 76 | Then
 | 
| 77 | 
 | 
| 78 | - `pea/pea_main.py | yaks check` type checks Oils in 1 second, and then you run
 | 
| 79 |   `bin/osh`.
 | 
| 80 | - `pea/pea_main.py | yaks cpp` generates C++ in one second
 | 
| 81 |   - then it should take 5-10 seconds to compile the output in Ninja.
 | 
| 82 | - Then you run `_bin/cxx-dbg/osh`
 | 
| 83 | 
 | 
| 84 | ## Implementation Strategy - It Requires Bootstrapping
 | 
| 85 | 
 | 
| 86 | Let's minimize the stuff we need to write twice.
 | 
| 87 | 
 | 
| 88 | - Write the first pass "yaks0" in Python 2 with re2c and ASDL.
 | 
| 89 |   - uses the NIL8 parser
 | 
| 90 |   - custom transformer from nil8.asdl -> yaks.asdl
 | 
| 91 |   - then print C++ directly with no type checking
 | 
| 92 | 
 | 
| 93 | - Then I think you can write "myyaks" ?
 | 
| 94 |   - It's a single mycpp pass that outputs Yaks, with all the type information?
 | 
| 95 | 
 | 
| 96 | OLDER
 | 
| 97 | 
 | 
| 98 | - then rewrite all of that in Yaks itself???
 | 
| 99 |   - lexer - does it invoke re2c directly?  No Python
 | 
| 100 |   - parser - with infix rule
 | 
| 101 |   - transform
 | 
| 102 |   - print
 | 
| 103 | 
 | 
| 104 | - then translate Python 2 to Yaks, and type check that
 | 
| 105 |   - first use MyPy?  does that make sense?  I think so.
 | 
| 106 |     - you preserve all the hacks
 | 
| 107 |     - REDUCE MYCPP TO A SINGLE PASS TO YAKS
 | 
| 108 |     - all passes go over yaks, not over MyPy
 | 
| 109 |       - const pass, forward decl pass, etc.
 | 
| 110 | 
 | 
| 111 | - then run it on itself!
 | 
| 112 | - then run it on Oils
 | 
| 113 | 
 | 
| 114 | - then add your own type checker
 | 
| 115 |   - go from `pea/pea_main.py` with the Python 2 AST to Yaks directly, without MyPy
 | 
| 116 | 
 | 
| 117 | So you could have:
 | 
| 118 | 
 | 
| 119 | - `yaks/{yaks_main.py,yaks.asdl}` -> generates mycpp C++ from `test.yaks`
 | 
| 120 |   - copy mycpp/examples
 | 
| 121 |   - {lex,read,transform,cppgen}.py
 | 
| 122 | - `yaks/yaks_main.yaks` -> generates mycpp C++ from `oils-for-unix.yaks`, which
 | 
| 123 |   is the CST dump of Python 2 !!!  
 | 
| 124 |   - {lex,read,transform,check,cppgen}.yaks -- add type checker here
 | 
| 125 | - `pea/py2_to_yaks.py`?
 | 
| 126 |   - I guess this dumps yaks directly?  Or is there another step?
 | 
| 127 |   - So we can use that type checker.  We're only writing a single type checker?
 | 
| 128 |   - Yaks will have to be expanded with `class with` and all that.
 | 
| 129 | 
 | 
| 130 | OK that's a little weird.  Why not just stick with all Python?  Because:
 | 
| 131 | 
 | 
| 132 | - `pea.yaks` has to be FAST!
 | 
| 133 | - Python has some weaknesses, like you need this extra ASDL
 | 
| 134 | 
 | 
| 135 | Although I suppose you could consider typed Python 3 or something?
 | 
| 136 | 
 | 
| 137 | That has proper declarations.  But you would still need ASDL.
 | 
| 138 | 
 | 
| 139 | ## Notes
 | 
| 140 | 
 | 
| 141 | ### NIL8 Features
 | 
| 142 | 
 | 
| 143 | - Might want multiline strings that are indented
 | 
| 144 |   - for printing blocks of C++
 | 
| 145 | 
 | 
| 146 | ### Line Wrapping
 | 
| 147 | 
 | 
| 148 | How do we make the output wrapped to 80 columns?  We need some kind of limit
 | 
| 149 | 
 | 
| 150 | - ASDL has an ad hoc function I borrowed from CPython
 | 
| 151 |   - it doesn't work all the time
 | 
| 152 | 
 | 
| 153 | ## First Words
 | 
| 154 | 
 | 
| 155 | Operators for names:
 | 
| 156 | 
 | 
| 157 |     (obj.field)
 | 
| 158 |     (obj..method)      # not -> because it conflicts with C syntax?  
 | 
| 159 |                        # We might want values and methods
 | 
| 160 |     (namespace::func)
 | 
| 161 | 
 | 
| 162 | Arithmetic:
 | 
| 163 | 
 | 
| 164 |     (+ 1 2) et.c
 | 
| 165 | 
 | 
| 166 |     # + - / * %
 | 
| 167 |     # ^ & |
 | 
| 168 |     # unary ~
 | 
| 169 | 
 | 
| 170 |     a,42   is a[42]
 | 
| 171 | 
 | 
| 172 |     (== 3 4)
 | 
| 173 |     (<  3 4)
 | 
| 174 |     (<= 3 4)
 | 
| 175 |     (>  3 4)
 | 
| 176 |     (>= 3 4)
 | 
| 177 | 
 | 
| 178 | Boolean:
 | 
| 179 | 
 | 
| 180 |     (not true)
 | 
| 181 |     (and true true)
 | 
| 182 |     (or false false)
 | 
| 183 | 
 | 
| 184 | Only 1 top level keyword:
 | 
| 185 | 
 | 
| 186 |     # Everything lives in a module.
 | 
| 187 |     (module osh
 | 
| 188 |       ...
 | 
| 189 |     )
 | 
| 190 |     (module ysh
 | 
| 191 |       ...
 | 
| 192 |     )
 | 
| 193 | 
 | 
| 194 | 
 | 
| 195 | How do we do import and export?  Do we have a require/provide kind of things?
 | 
| 196 | 
 | 
| 197 | Does this replace Ninja?  Info is duplicated in C++ and Ninja.
 | 
| 198 | 
 | 
| 199 | Module-level keywords:
 | 
| 200 | 
 | 
| 201 |     (global [s Str] "123")
 | 
| 202 | 
 | 
| 203 |     (func main [x Int] ...)
 | 
| 204 | 
 | 
| 205 |     (class Foo
 | 
| 206 |       (construct Foo [x Int] ...)  
 | 
| 207 |       (method main [x Str] ...)
 | 
| 208 | 
 | 
| 209 |       (member x Int)
 | 
| 210 |       # Or you could do this I suppose
 | 
| 211 | 
 | 
| 212 |       [x Int]
 | 
| 213 |       [y Int]
 | 
| 214 |     )
 | 
| 215 | 
 | 
| 216 |     # Do we need something special for
 | 
| 217 |     # There could be a macro that desugars this
 | 
| 218 | 
 | 
| 219 |     (class ctx_Foo
 | 
| 220 |       (construct ctx_Foo)
 | 
| 221 |       (method __enter__  # no-op
 | 
| 222 |         ...
 | 
| 223 |         )
 | 
| 224 |       (method __exit__
 | 
| 225 |     )
 | 
| 226 | 
 | 
| 227 | Within functions:
 | 
| 228 | 
 | 
| 229 |     (var [x Str] "123")
 | 
| 230 | 
 | 
| 231 |     # function call
 | 
| 232 |     (print "hi")
 | 
| 233 | 
 | 
| 234 |     # or maybe
 | 
| 235 |     (call print "hi")
 | 
| 236 | 
 | 
| 237 |     (if true (print "true") (print "false"))
 | 
| 238 | 
 | 
| 239 |     (while true
 | 
| 240 |       (print "hi")
 | 
| 241 |     )
 | 
| 242 | 
 | 
| 243 |     (foreach  # not like for in C, that could be separate
 | 
| 244 | 
 | 
| 245 |     (switch x
 | 
| 246 |       (case [1 2 3]
 | 
| 247 |         (call print "num")
 | 
| 248 |         (call print "hi")
 | 
| 249 |       )
 | 
| 250 |       (case [4 5 6]
 | 
| 251 |         (call print "num")
 | 
| 252 |         (call print "hi")
 | 
| 253 |       )
 | 
| 254 |     )
 | 
| 255 | 
 | 
| 256 |     (break)
 | 
| 257 |     (continue)
 | 
| 258 |     (return 42)
 | 
| 259 | 
 | 
| 260 | Special within functions:
 | 
| 261 | 
 | 
| 262 |     (tagswitch (call obj..tag)  # method call I guess
 | 
| 263 |       (case command_e.Simple
 | 
| 264 |         (call print "simple")
 | 
| 265 |       )
 | 
| 266 |       (case command_e::Pipeline  # maybe use ::
 | 
| 267 |         (call print "pipe")
 | 
| 268 |       )
 | 
| 269 |     )
 | 
| 270 | 
 | 
| 271 |     (with (call ctx_Foo a b) 
 | 
| 272 |       (call print "hi")
 | 
| 273 |     )
 | 
| 274 | 
 | 
| 275 | Not used in mycpp, but could be in other C++ code generators:
 | 
| 276 | 
 | 
| 277 |     (& i)  # address of
 | 
| 278 |     (* pi)  # pointer dereference
 | 
| 279 | 
 | 
| 280 | 
 | 
| 281 | ### `PY_foo` for mycpp ambiguities
 | 
| 282 | 
 | 
| 283 | I think there can be a separate pass that resolves this?
 | 
| 284 | 
 | 
| 285 | Export
 | 
| 286 | 
 | 
| 287 |     t(x)
 | 
| 288 |     (PY_call t x)
 | 
| 289 | 
 | 
| 290 | Then this gets turned into EITHER:
 | 
| 291 | 
 | 
| 292 |     (call t x)
 | 
| 293 |     (call [Alloc t] x)
 | 
| 294 | 
 | 
| 295 | I think it depends on the symbol table.
 | 
| 296 | 
 | 
| 297 | Likewise if you have
 | 
| 298 | 
 | 
| 299 |     x.y()
 | 
| 300 | 
 | 
| 301 | That has to be
 | 
| 302 | 
 | 
| 303 |     (call (PY_attr x y))
 | 
| 304 | 
 | 
| 305 | Turn it into either
 | 
| 306 | 
 | 
| 307 |     (call x->y)  # obj->member
 | 
| 308 |     (call x::y)  # namespace::func
 | 
| 309 | 
 | 
| 310 | 
 | 
| 311 | More
 | 
| 312 | 
 | 
| 313 | - `PY_cast` -> `reinterpret_cast` or `static_cast`
 | 
| 314 | 
 | 
| 315 | 
 | 
| 316 | ### Other Issues
 | 
| 317 | 
 | 
| 318 | - C++ nested scope vs. flat function scope
 | 
| 319 | 
 | 
| 320 | 
 |