| 1 | mycpp
 | 
| 2 | =====
 | 
| 3 |  
 | 
| 4 | This is a Python-to-C++ translator based on MyPy.  It only
 | 
| 5 | handles the small subset of Python that we use in Oils.
 | 
| 6 | 
 | 
| 7 | It's inspired by both mypyc and Shed Skin.  These posts give background:
 | 
| 8 | 
 | 
| 9 | - [Brief Descriptions of a Python to C++ Translator](https://www.oilshell.org/blog/2022/05/mycpp.html)
 | 
| 10 | - [Oil Is Being Implemented "Middle Out"](https://www.oilshell.org/blog/2022/03/middle-out.html)
 | 
| 11 | 
 | 
| 12 | As of March 2024, the translation to C++ is **done**.  So it's no longer
 | 
| 13 | experimental!
 | 
| 14 | 
 | 
| 15 | However, it's still pretty **hacky**.  This doc exists mainly to explain the
 | 
| 16 | hacks.  (We may want to rewrite mycpp as "yaks", although it's low priority
 | 
| 17 | right now.)
 | 
| 18 | 
 | 
| 19 | ---
 | 
| 20 | 
 | 
| 21 | Source for this doc: [mycpp/README.md]($oils-src).  The code is all in
 | 
| 22 | [mycpp/]($oils-src).
 | 
| 23 | 
 | 
| 24 | 
 | 
| 25 | <div id="toc">
 | 
| 26 | </div>
 | 
| 27 | 
 | 
| 28 | ## Instructions
 | 
| 29 | 
 | 
| 30 | ### Translating and Compiling `oils-cpp`
 | 
| 31 | 
 | 
| 32 | Running `mycpp` is best done on a Debian / Ubuntu-ish machine.  Follow the
 | 
| 33 | instructions at <https://github.com/oilshell/oil/wiki/Contributing> to create
 | 
| 34 | the "dev build" first, which is DISTINCT from the C++ build.  Make sure you can
 | 
| 35 | run:
 | 
| 36 | 
 | 
| 37 |     oil$ build/py.sh all
 | 
| 38 | 
 | 
| 39 | This will give you a working shell:
 | 
| 40 | 
 | 
| 41 |     oil$ bin/osh -c 'echo hi'  # running interpreted Python
 | 
| 42 |     hi
 | 
| 43 | 
 | 
| 44 | To run mycpp, we will build Python 3.10, clone MyPy, and install MyPy's
 | 
| 45 | dependencies.  First install packages:
 | 
| 46 | 
 | 
| 47 |     # We need libssl-dev, libffi-dev, zlib1g-dev to bootstrap Python
 | 
| 48 |     oil$ build/deps.sh install-ubuntu-packages
 | 
| 49 | 
 | 
| 50 | Then fetch data, like the Python 3.10 tarball and MyPy repo:
 | 
| 51 | 
 | 
| 52 |     oil$ build/deps.sh fetch
 | 
| 53 | 
 | 
| 54 | Then build from source:
 | 
| 55 | 
 | 
| 56 |     oil$ build/deps.sh install-wedges
 | 
| 57 | 
 | 
| 58 | To build oil-native, use:
 | 
| 59 | 
 | 
| 60 |     oil$ ./NINJA-config.sh
 | 
| 61 |     oil$ ninja              # translate and compile, may take 30 seconds
 | 
| 62 | 
 | 
| 63 |     oil$ _bin/cxx-asan/osh -c 'echo hi'  # running compiled C++ !
 | 
| 64 |     hi
 | 
| 65 | 
 | 
| 66 | To run the tests and benchmarks:
 | 
| 67 | 
 | 
| 68 |     oil$ mycpp/TEST.sh test-translator
 | 
| 69 |     ... 200+ tasks run ...
 | 
| 70 | 
 | 
| 71 | If you have problems, post a message on `#oil-dev` at
 | 
| 72 | `https://oilshell.zulipchat.com`.  Not many people have contributed to `mycpp`,
 | 
| 73 | so I can use your feedback!
 | 
| 74 | 
 | 
| 75 | Related:
 | 
| 76 | 
 | 
| 77 | - [Oil Native Quick
 | 
| 78 | Start](https://github.com/oilshell/oil/wiki/Oil-Native-Quick-Start) on the
 | 
| 79 | wiki.
 | 
| 80 | - [Oil Dev Cheat Sheet](https://github.com/oilshell/oil/wiki/Oil-Native-Quick-Start)
 | 
| 81 | 
 | 
| 82 | ## Notes on the Algorithm / Architecture
 | 
| 83 | 
 | 
| 84 | There are four passes over the MyPy AST.
 | 
| 85 | 
 | 
| 86 | (1) `const_pass.py`: Collect string constants 
 | 
| 87 | 
 | 
| 88 | Turn turn the constant in `myfunc("foo")` into top-level `GLOBAL_STR(str1,
 | 
| 89 | "foo")`.
 | 
| 90 |   
 | 
| 91 | (2) Three passes in `cppgen_pass.py`.
 | 
| 92 | 
 | 
| 93 | (a) Forward Declaration Pass.
 | 
| 94 | 
 | 
| 95 |     class Foo;
 | 
| 96 |     class Bar;
 | 
| 97 | 
 | 
| 98 | This pass also determines which methods should be declared `virtual` in their
 | 
| 99 | declarations.  The `virtual` keyword is written in the next pass.
 | 
| 100 | 
 | 
| 101 | (b) Declaration Pass.
 | 
| 102 | 
 | 
| 103 |     class Foo {
 | 
| 104 |       void method();
 | 
| 105 |     };
 | 
| 106 |     class Bar {
 | 
| 107 |       void method();
 | 
| 108 |     };
 | 
| 109 | 
 | 
| 110 | More work in this pass:
 | 
| 111 | 
 | 
| 112 | - Collect member variables and write them at the end of the definition
 | 
| 113 | - Collect locals for "hoisting".  Written in the next pass.
 | 
| 114 | 
 | 
| 115 | (c) Definition Pass.
 | 
| 116 | 
 | 
| 117 |     void Foo:method() {
 | 
| 118 |       ...
 | 
| 119 |     }
 | 
| 120 | 
 | 
| 121 |     void Bar:method() {
 | 
| 122 |       ...
 | 
| 123 |     }
 | 
| 124 | 
 | 
| 125 | Note: I really wish we were not using visitors, but that's inherited from MyPy.
 | 
| 126 | 
 | 
| 127 | ## mycpp Idioms / "Creative Hacks"
 | 
| 128 | 
 | 
| 129 | Oils is written in typed Python 2.  It will run under a stock Python 2
 | 
| 130 | interpreter, and it will typecheck with stock MyPy.
 | 
| 131 | 
 | 
| 132 | However, there are a few language features that don't map cleanly from typed
 | 
| 133 | Python to C++:
 | 
| 134 | 
 | 
| 135 | - switch statements (unfortunately we don't have the Python 3 match statement)
 | 
| 136 | - C++ destructors - the RAII ptatern
 | 
| 137 | - casting - MyPy has one kind of cast; C++ has `static_cast` and
 | 
| 138 |   `reinterpret_cast`.  (We don't use C-style casting.)
 | 
| 139 | 
 | 
| 140 | So this describes the idioms we use.  There are some hacks in
 | 
| 141 | [mycpp/cppgen_pass.py]($oils-src) to handle these cases, and also Python
 | 
| 142 | runtime equivalents in `mycpp/mylib.py`.
 | 
| 143 | 
 | 
| 144 | ### `with {,tag,str_}switch` → Switch statement
 | 
| 145 | 
 | 
| 146 | We have three constructs that translate to a C++ switch statement.  They use a
 | 
| 147 | Python context manager `with Xswitch(obj) ...` as a little hack.
 | 
| 148 | 
 | 
| 149 | Here are examples like the ones in [mycpp/examples/test_switch.py]($oils-src).
 | 
| 150 | (`ninja mycpp-logs-equal` translates, compiles, and tests all the examples.)
 | 
| 151 | 
 | 
| 152 | Simple switch:
 | 
| 153 | 
 | 
| 154 |     myint = 99
 | 
| 155 |     with switch(myint) as case:
 | 
| 156 |         if case(42, 43):
 | 
| 157 |             print('forties')
 | 
| 158 |         else:
 | 
| 159 |             print('other')
 | 
| 160 | 
 | 
| 161 | Switch on **object type**, which goes well with ASDL sum types:
 | 
| 162 | 
 | 
| 163 |     val = value.Str('foo)  # type: value_t
 | 
| 164 |     with tagswitch(val) as case:
 | 
| 165 |         if case(value_e.Str, value_e.Int):
 | 
| 166 |             print('string or int')
 | 
| 167 |         else:
 | 
| 168 |             print('other')
 | 
| 169 | 
 | 
| 170 | We usually need to apply the `UP_val` pattern here, described in the next
 | 
| 171 | section.
 | 
| 172 | 
 | 
| 173 | Switch on **string**, which generates a fast **two-level dispatch** -- first on
 | 
| 174 | length, and then with `str_equals_c()`:
 | 
| 175 | 
 | 
| 176 |     s = 'foo'
 | 
| 177 |     with str_switch(s) as case:
 | 
| 178 |         if case("foo")
 | 
| 179 |             print('FOO')
 | 
| 180 |         else:
 | 
| 181 |             print('other')
 | 
| 182 | 
 | 
| 183 | ### `val` → `UP_val` → `val` Downcasting pattern
 | 
| 184 | 
 | 
| 185 | Summary: variable names like `UP_*` are **special** in our Python code.
 | 
| 186 | 
 | 
| 187 | Consider the downcasts marked BAD:
 | 
| 188 | 
 | 
| 189 |     val = value.Str('foo)  # type: value_t
 | 
| 190 | 
 | 
| 191 |     with tagswitch(obj) as case:
 | 
| 192 |         if case(value_e.Str):
 | 
| 193 |             val = cast(value.Str, val)  # BAD: conflicts with first declaration
 | 
| 194 |             print('s = %s' % val.s)
 | 
| 195 | 
 | 
| 196 |         elif case(value_e.Int):
 | 
| 197 |             val = cast(value.Int, val)  # BAD: conflicts with both
 | 
| 198 |             print('i = %d' % val.i)
 | 
| 199 | 
 | 
| 200 |         else:
 | 
| 201 |             print('other')
 | 
| 202 | 
 | 
| 203 | MyPy allows this, but it translates to invalid C++ code.  C++ can't have a
 | 
| 204 | variable named `val`, with 2 related types `value_t` and `value::Str`.
 | 
| 205 | 
 | 
| 206 | So we use this idiom instead, which takes advantage of **local vars in case
 | 
| 207 | blocks** in C++:
 | 
| 208 | 
 | 
| 209 |     val = value.Str('foo')  # type: value_t
 | 
| 210 | 
 | 
| 211 |     UP_val = val  # temporary variable that will be casted
 | 
| 212 | 
 | 
| 213 |     with tagswitch(val) as case:
 | 
| 214 |         if case(value_e.Str):
 | 
| 215 |             val = cast(value.Str, UP_val)  # this works
 | 
| 216 |             print('s = %s' % val.s)
 | 
| 217 | 
 | 
| 218 |         elif case(value_e.Int):
 | 
| 219 |             val = cast(value.Int, UP_val)  # also works
 | 
| 220 |             print('i = %d' % val.i)
 | 
| 221 | 
 | 
| 222 |         else:
 | 
| 223 |             print('other')
 | 
| 224 | 
 | 
| 225 | This translates to something like:
 | 
| 226 | 
 | 
| 227 |     value_t* val = Alloc<value::Str>(str42);
 | 
| 228 |     value_t* UP_val = val;
 | 
| 229 | 
 | 
| 230 |     switch (val->tag()) {
 | 
| 231 |         case value_e::Str: {
 | 
| 232 |             // DIFFERENT local var
 | 
| 233 |             value::Str* val = static_cast<value::Str>(UP_val);
 | 
| 234 |             print(StrFormat(str43, val->s))
 | 
| 235 |         }
 | 
| 236 |             break;
 | 
| 237 |         case value_e::Int: {
 | 
| 238 |             // ANOTHER DIFFERENT local var
 | 
| 239 |             value::Int* val = static_cast<value::Int>(UP_val);
 | 
| 240 |             print(StrFormat(str44, val->i))
 | 
| 241 |         }
 | 
| 242 |             break;
 | 
| 243 |         default:
 | 
| 244 |             print(str45);
 | 
| 245 |     }
 | 
| 246 | 
 | 
| 247 | This works because there's no problem having **different** variables with the
 | 
| 248 | same name within each `case { }` block.
 | 
| 249 | 
 | 
| 250 | Again, the names `UP_*` are **special**.  If the name doesn't start with `UP_`,
 | 
| 251 | the inner blocks will look like:
 | 
| 252 | 
 | 
| 253 |         case value_e::Str: {
 | 
| 254 |             val = static_cast<value::Str>(val);  // BAD: val reused
 | 
| 255 |             print(StrFormat(str43, val->s))
 | 
| 256 |         }
 | 
| 257 | 
 | 
| 258 | And they will fail to compile.  It's not valid C++ because the superclass
 | 
| 259 | `value_t` doesn't have a field `val->s`.  Only the subclass `value::Str` has
 | 
| 260 | it.
 | 
| 261 | 
 | 
| 262 | (Note that Python has a single flat scope per function, while C++ has nested
 | 
| 263 | scopes.)
 | 
| 264 | 
 | 
| 265 | ### Python context manager → C++ constructor and destructor (RAII)
 | 
| 266 | 
 | 
| 267 | This Python code:
 | 
| 268 | 
 | 
| 269 |     with ctx_Foo(42):
 | 
| 270 |       f()
 | 
| 271 | 
 | 
| 272 | translates to this C++ code:
 | 
| 273 | 
 | 
| 274 |     {
 | 
| 275 |       ctx_Foo tmp(42);
 | 
| 276 |       f()
 | 
| 277 | 
 | 
| 278 |       // destructor ~ctx_Foo implicitly called
 | 
| 279 |     }
 | 
| 280 | 
 | 
| 281 | ## MyPy "Shimming" Technique
 | 
| 282 | 
 | 
| 283 | We have an interesting way of "writing Python and C++ at the same time":
 | 
| 284 | 
 | 
| 285 | 1. First, all Python code must pass the MyPy type checker, and run with a stock
 | 
| 286 |    Python 2 interpreter.
 | 
| 287 |    - This is the source of truth — the source of our semantics.
 | 
| 288 | 1. We translate most `.py` files to C++, **except** some files, in particular
 | 
| 289 |    [mycpp/mylib.py]($oils-src) and files starting with `py` like
 | 
| 290 |    `core/{pyos.pyutil}.py`.
 | 
| 291 | 1. In C++, we can substitute custom implementations with the properties we
 | 
| 292 |    want, like `Dict<K, V>` being ordered, `BigInt` being distinct from C `int`,
 | 
| 293 |    `BufWriter` being efficient, etc.
 | 
| 294 | 
 | 
| 295 | The MyPy type system is very powerful!  It lets us do all this.
 | 
| 296 | 
 | 
| 297 | ### NewDict() for ordered dicts
 | 
| 298 | 
 | 
| 299 | Dicts in Python 2 aren't ordered, but we make them ordered at **runtime** by
 | 
| 300 | using `mylib.NewDict()`, which returns `collections_.OrderedDict`.
 | 
| 301 | 
 | 
| 302 | The **static type** is still `Dict[K, V]`, but change the "spec" to be an
 | 
| 303 | ordered dict.
 | 
| 304 | 
 | 
| 305 | In C++, `Dict<K, V>` is implemented as an ordered dict.  (Note: we don't
 | 
| 306 | implement preserving order on deletion, which seems OK.)
 | 
| 307 | 
 | 
| 308 | - TODO: `iteritems()` could go away
 | 
| 309 | 
 | 
| 310 | ### StackArray[T]
 | 
| 311 | 
 | 
| 312 | TODO: describe this when it works.
 | 
| 313 | 
 | 
| 314 | ### BigInt
 | 
| 315 | 
 | 
| 316 | - In Python, it's simply defined a a class with an integer, in
 | 
| 317 |   [mylib/mops.py]($oils-src).
 | 
| 318 | - In C++, it's currently `typedef int64_t BigInt`, but we want to make it a big
 | 
| 319 |   integer.
 | 
| 320 | 
 | 
| 321 | ### ByteAt(), ByteEquals(), ...
 | 
| 322 | 
 | 
| 323 | Hand optimization to reduce 1-byte strings.  For IFS algorithm,
 | 
| 324 | `LooksLikeGlob()`, `GlobUnescape()`.
 | 
| 325 | 
 | 
| 326 | ### File / LineReader / BufWriter
 | 
| 327 | 
 | 
| 328 | TODO: describe how this works.
 | 
| 329 | 
 | 
| 330 | Can it be more type safe?  I think we can cast `File` to both `LineReader` and
 | 
| 331 | `BufWriter`.
 | 
| 332 | 
 | 
| 333 | Or can we invert the relationship, so `File` derives from **both** LineReader
 | 
| 334 | and BufWriter?
 | 
| 335 | 
 | 
| 336 | ### Fast JSON - avoid intermediate allocations
 | 
| 337 | 
 | 
| 338 | - `pyj8.WriteString()` is shimmed so we don't create encoded J8 string objects,
 | 
| 339 |   only to throw them away and write to `mylib.BufWriter`.  Instead, we append
 | 
| 340 |   an encoded strings **directly** to the `BufWriter`.
 | 
| 341 | - Likewise, we have `BufWriter::write_spaces` to avoid temporary allocations
 | 
| 342 |   when writing indents.
 | 
| 343 |   - This could be generalized to `BufWriter::write_repeated(' ', 42)`.
 | 
| 344 | - We may also want `BufWriter::write_slice()`
 | 
| 345 | 
 | 
| 346 | ## Limitations Requiring Source Rewrites
 | 
| 347 | 
 | 
| 348 | mycpp itself may cause limitations on expressiveness, or the C++ language may
 | 
| 349 | be able express what we want.
 | 
| 350 | 
 | 
| 351 | - C++ doesn't have `try / except / else`, or `finally`
 | 
| 352 |   - Use the `with ctx_Foo` pattern instead.
 | 
| 353 | - `if mylist` tests if the pointer is non-NULL; use `if len(mylist)` for
 | 
| 354 |   non-empty test
 | 
| 355 | - Functions can have at most one keyword / optional argument.
 | 
| 356 |   - We generate two methods: `f(x)` which calls `f(x, y)` with the default
 | 
| 357 |     value of `y`
 | 
| 358 |   - If there are two or more optional arguments:
 | 
| 359 |     - For classes, you can use the "builder pattern", i.e. add an
 | 
| 360 |       `Init_MyMember()` method
 | 
| 361 |     - If the arguments are booleans, translate it to a single bitfield argument
 | 
| 362 | - C++ has nested scope and Python has flat function scope.  This can cause name
 | 
| 363 |   collisions.
 | 
| 364 |   - Could enforce this if it becomes a problem
 | 
| 365 | 
 | 
| 366 | Also see `mycpp/examples/invalid_*` for Python code that fails to translate.
 | 
| 367 | 
 | 
| 368 | ## WARNING: Assumptions Not Checked
 | 
| 369 | 
 | 
| 370 | ### Global Constants Can't Be Mutated
 | 
| 371 | 
 | 
| 372 | We translate top level constants to statically initialized C data structures
 | 
| 373 | (zero startup cost):
 | 
| 374 | 
 | 
| 375 |     gStr = 'foo'   
 | 
| 376 |     gList = [1, 2]  # type: List[int]
 | 
| 377 |     gDict = {'bar': 42}  # type: Dict[str, int]
 | 
| 378 | 
 | 
| 379 | Even though `List` and `Dict` are mutable in general, you should **NOT** mutate
 | 
| 380 | these global instances!  The C++ code will break at runtime.
 | 
| 381 | 
 | 
| 382 | ### Gotcha about Returning Variants (Subclasses) of a Type
 | 
| 383 | 
 | 
| 384 | MyPy will accept this code:
 | 
| 385 | 
 | 
| 386 | ```
 | 
| 387 | if cond:
 | 
| 388 |   sig = proc_sig.Open  # type: proc_sig_t
 | 
| 389 |                        # bad because mycpp HOISTS this
 | 
| 390 | else:
 | 
| 391 |   sig = proc_sig.Closed.CreateNull()
 | 
| 392 |   sig.words = words    # assignment fails
 | 
| 393 | return sig
 | 
| 394 | ```
 | 
| 395 | 
 | 
| 396 | It will translate to C++, but fail to compile.  Instead, rewrite it like this:
 | 
| 397 | 
 | 
| 398 | ```
 | 
| 399 | sig = None  # type: proc_sig_t
 | 
| 400 | if cond:
 | 
| 401 |   sig = proc_sig.Open  # type: proc_sig_t
 | 
| 402 |                        # bad because mycpp HOISTS this
 | 
| 403 | else:
 | 
| 404 |   closed = proc_sig.Closed.CreateNull()
 | 
| 405 |   closed.words = words    # assignment fails
 | 
| 406 |   sig = closed
 | 
| 407 | return sig
 | 
| 408 | ```
 | 
| 409 | 
 | 
| 410 | ### Exceptions Can't Leave Destructors / Python `__exit__`
 | 
| 411 | 
 | 
| 412 | Context managers like `with ctx_Foo():` translate to C++ constructors and
 | 
| 413 | destructors.
 | 
| 414 | 
 | 
| 415 | In C++, a destructor can't "leave" an exception.  It results in a runtime error.
 | 
| 416 | 
 | 
| 417 | You can throw and CATCH an exception WITHIN a destructor, but you can't let it
 | 
| 418 | propagate outside.
 | 
| 419 | 
 | 
| 420 | This means you must be careful when coding the `__exit__` method.  For example,
 | 
| 421 | in `vm::ctx_Redirect`, we had this bug due to `IOError` being thrown and not
 | 
| 422 | caught when restoring/popping redirects.
 | 
| 423 | 
 | 
| 424 | To fix the bug, we rewrote the code to use an out param
 | 
| 425 | `List[IOError_OSError]`.
 | 
| 426 | 
 | 
| 427 | Related:
 | 
| 428 | 
 | 
| 429 | - <https://akrzemi1.wordpress.com/2011/09/21/destructors-that-throw/>
 | 
| 430 | 
 | 
| 431 | ## More Translation Notes
 | 
| 432 | 
 | 
| 433 | ### Hacky Heuristics
 | 
| 434 | 
 | 
| 435 | - `callable(arg)` to either:
 | 
| 436 |   - function call `f(arg)`
 | 
| 437 |   - instantiation `Alloc<T>(arg)`
 | 
| 438 | - `name.attr` to either:
 | 
| 439 |   - `obj->member`
 | 
| 440 |   - `module::Func`
 | 
| 441 | - `cast(MyType, obj)` to either
 | 
| 442 |   - `static_cast<MyType*>(obj)`
 | 
| 443 |   - `reinterpret_cast<MyType*>(obj)`
 | 
| 444 | 
 | 
| 445 | ### Hacky Hard-Coded Names
 | 
| 446 | 
 | 
| 447 | These are signs of coupling between mycpp and Oils, which ideally shouldn't
 | 
| 448 | exist.
 | 
| 449 | 
 | 
| 450 | - `mycpp_main.py`
 | 
| 451 |   - `ModulesToCompile()` -- some files have to be ordered first, like the ASDL
 | 
| 452 |     runtime.
 | 
| 453 |     - TODO: Pea can respect parameter order?  So we do that outside the project?
 | 
| 454 |     - Another ordering constraint comes from **inheritance**.  The forward
 | 
| 455 |       declaration is NOT sufficient in that case.
 | 
| 456 | - `cppgen_pass.py`
 | 
| 457 |   - `_GetCastKind()` has some hard-coded names
 | 
| 458 |   - `AsdlType::Create()` is special cased to `::`, not `->`
 | 
| 459 |   - Default arguments e.g. `scope_e::Local` need a repeated `using`.
 | 
| 460 | 
 | 
| 461 | Issue on mycpp improvements: <https://github.com/oilshell/oil/issues/568>
 | 
| 462 | 
 | 
| 463 | ### Major Features
 | 
| 464 | 
 | 
| 465 | - Python `int` and `bool` → C++ `int` and `bool`
 | 
| 466 |   - `None` → `nullptr`
 | 
| 467 | - Statically Typed Python Collections
 | 
| 468 |   - `str` → `Str*`
 | 
| 469 |   - `List[T]` → `List<T>*`
 | 
| 470 |   - `Dict[K, V]` → `Dict<K, V>*`
 | 
| 471 |   - tuples → `Tuple2<A, B>`, `Tuple3<A, B, C>`, etc.
 | 
| 472 | - Collection literals turn into initializer lists
 | 
| 473 |   - And there is a C++ type inference issue which requires an explicit
 | 
| 474 |     `std::initializer_list<int>{1, 2, 3}`, not just `{1, 2, 3}`
 | 
| 475 | - Python's polymorphic iteration → `StrIter`, `ListIter<T>`, `DictIter<K,
 | 
| 476 |   V`
 | 
| 477 |   - `d.iteritems()` is rewritten `mylib.iteritems()` → `DictIter`
 | 
| 478 |     - TODO: can we be smarter about this?
 | 
| 479 |   - `reversed(mylist)` → `ReverseListIter`
 | 
| 480 | - Python's `in` operator:
 | 
| 481 |   - `s in mystr` → `str_contains(mystr, s)`
 | 
| 482 |   - `x in mylist` → `list_contains(mylist, x)`
 | 
| 483 | - Classes and inheritance
 | 
| 484 |   - `__init__` method becomes a constructor.  Note: initializer lists aren't
 | 
| 485 |     used.
 | 
| 486 |   - Detect `virtual` methods
 | 
| 487 |   - TODO: could we detect `abstract` methods? (`NotImplementedError`)
 | 
| 488 | - Python generators `Iterator[T]` → eager `List<T>` accumulators
 | 
| 489 | - Python Exceptions → C++ exceptions
 | 
| 490 | - Python Modules → C++ namespace (we assume a 2-level hierarchy)
 | 
| 491 |   - TODO: mycpp need real modules, because our `oils_for_unix.mycpp.cc`
 | 
| 492 |     translation unit is getting big.
 | 
| 493 |   - And `cpp/preamble.h` is a hack to work around the lack of modules.
 | 
| 494 | 
 | 
| 495 | ### Minor Translations
 | 
| 496 | 
 | 
| 497 | - `s1 == s2` → `str_equals(s1, s2)`
 | 
| 498 | - `'x' * 3` → `str_repeat(globalStr, 3)`
 | 
| 499 | - `[None] * 3` → `list_repeat(nullptr, 3)`
 | 
| 500 | - Omitted:
 | 
| 501 |   - If the LHS of an assignment is `_`, then the statement is omitted
 | 
| 502 |     - This is for `_ = log`, which shuts up Python lint warnings for 'unused
 | 
| 503 |       import'
 | 
| 504 |   - Code under `if __name__ == '__main__'`
 | 
| 505 | 
 | 
| 506 | ### Optimizations
 | 
| 507 | 
 | 
| 508 | - Returning Tuples by value.  To reduce GC pressure, we we return
 | 
| 509 |   `Tuple2<A, B>` instead of `Tuple2<A, B>*`, and likewise for `Tuple3` and `Tuple4`.
 | 
| 510 | 
 | 
| 511 | ### Rooting Policy
 | 
| 512 | 
 | 
| 513 | The translated code roots local variables in every function
 | 
| 514 | 
 | 
| 515 |     StackRoots _r({&var1, &var2});
 | 
| 516 | 
 | 
| 517 | We have two kinds of hand-written code:
 | 
| 518 | 
 | 
| 519 | 1. Methods like `Str::strip()` in `mycpp/` 
 | 
| 520 | 2. OS bindings like `stat()` in `cpp/` 
 | 
| 521 | 
 | 
| 522 | Neither of them needs any rooting!  This is because we use **manual collection
 | 
| 523 | points** in the interpreter, and these functions don't call any functions that
 | 
| 524 | can collect.  They are "leaves" in the call tree.
 | 
| 525 | 
 | 
| 526 | ## The mycpp Runtime 
 | 
| 527 | 
 | 
| 528 | The mycpp translator targets a runtime that's written from scratch.  It
 | 
| 529 | implements garbage-collected data structures like:
 | 
| 530 | 
 | 
| 531 | - Typed records
 | 
| 532 |   - Python classes
 | 
| 533 |   - ASDL product and sum types
 | 
| 534 | - `Str` (immutable, as in Python)
 | 
| 535 | - `List<T>`
 | 
| 536 | - `Dict<K, V>`
 | 
| 537 | - `Tuple2<A, B>`, `Tuple3<A, B, C>`, ...
 | 
| 538 | 
 | 
| 539 | It also has functions based on CPython's:
 | 
| 540 | 
 | 
| 541 | - `mycpp/gc_builtins.{h,cc}` corresponds roughly to Python's `__builtin__`
 | 
| 542 |   module, e.g. `int()` and `str()`
 | 
| 543 | - `mycpp/gc_mylib.{h,cc}` corresponds `mylib.py` 
 | 
| 544 |   - `mylib.BufWriter` is a bit like `cStringIO.StringIO`
 | 
| 545 | 
 | 
| 546 | ### Differences from CPython
 | 
| 547 | 
 | 
| 548 | - Integers either C `int` or `mylib.BigInt`, not Python's arbitrary size
 | 
| 549 |   integers
 | 
| 550 | - `NUL` bytes are allowed in arguments to syscalls like `open()`, unlike in
 | 
| 551 |   CPython
 | 
| 552 | - `s.strip()` is defined in terms of ASCII whitespace, which does not include
 | 
| 553 |   say `\v`.
 | 
| 554 |   - This is done to be consistent with JSON and J8 Notation.
 | 
| 555 | 
 | 
| 556 | ## C++ Notes
 | 
| 557 | 
 | 
| 558 | ### Gotchas
 | 
| 559 | 
 | 
| 560 | - C++ classes can have 2 member variables of the same name!  From the base
 | 
| 561 |   class and derived class.
 | 
| 562 | - Failing to declare methods `virtual` can involve the wrong one being called
 | 
| 563 |   at runtime
 | 
| 564 | 
 | 
| 565 | ### Minor Features Used
 | 
| 566 | 
 | 
| 567 | In addition to classes, templates, exceptions, etc. mentioned above, we use:
 | 
| 568 | 
 | 
| 569 | - `static_cast` and `reinterpret_cast`
 | 
| 570 | - `enum class` for ASDL
 | 
| 571 | - Function overloading
 | 
| 572 |   - For equality and hashing?
 | 
| 573 | - `offsetof` for introspection of field positions for garbage collection
 | 
| 574 | - `std::initializer_list` for `StackRoots()`
 | 
| 575 |   - Should we get rid of this?
 | 
| 576 | 
 | 
| 577 | ### Not Used
 | 
| 578 | 
 | 
| 579 | - I/O Streams, RTTI, etc.
 | 
| 580 | - `const`
 | 
| 581 | - Smart pointers
 | 
| 582 | 
 |