| 1 | 2022-04-19
 | 
| 2 | 
 | 
| 3 | 
 | 
| 4 | c. Proposal Summary/Scope of Work
 | 
| 5 | Provide a short summary of the work being proposed (maximum of 500 words)
 | 
| 6 | 
 | 
| 7 | The Unix shell is a central user interface and glue language in all kinds of
 | 
| 8 | scientific computing, particularly in bioinformatics.
 | 
| 9 | 
 | 
| 10 | Oil is a new open source Unix shell. It's meant to be our upgrade path from GNU
 | 
| 11 | bash, the most popular shell in the world. Home page: https://www.oilshell.org/
 | 
| 12 | 
 | 
| 13 | It runs your existing scripts, and allows you to upgrade them to the new Oil
 | 
| 14 | language, which is designed to be familiar to Python and JavaScript users.
 | 
| 15 | 
 | 
| 16 | In the last few years, we've released a correct but slow implementation dozens
 | 
| 17 | of times, and gotten regular feedback from users.
 | 
| 18 | 
 | 
| 19 | Now we need a COMPILER ENGINEER to finish semi-automatically translating the
 | 
| 20 | code to fast C++. This plot is a quick way to see this:
 | 
| 21 | https://www.oilshell.org/blog/2022/03/spec-test-history.png
 | 
| 22 | 
 | 
| 23 | Blog post with this plot: https://www.oilshell.org/blog/2022/03/middle-out.html
 | 
| 24 | 
 | 
| 25 | Roughly speaking, we'll have a competitive replacement for / upgrade to bash
 | 
| 26 | when the red line meets the blue line!
 | 
| 27 | 
 | 
| 28 | So the work is already more than half done, and I would consider it low risk /
 | 
| 29 | high reward. Addressing the speed issue will allow us to aggressively add new
 | 
| 30 | features and polish the documentation.
 | 
| 31 | 
 | 
| 32 | Our FAQ has over 178K views, having been featured in many places like Hacker
 | 
| 33 | News: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
 | 
| 34 | 
 | 
| 35 | -----
 | 
| 36 | 
 | 
| 37 | I've drafted the job requirements here:
 | 
| 38 | https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job
 | 
| 39 | 
 | 
| 40 | I will use my professional network (having worked at Google and EA) to find the
 | 
| 41 | compiler engineer, who will be skilled in compilers, C++ and Python.
 | 
| 42 | 
 | 
| 43 | Python creator Guido van Rossum knows about Oil:
 | 
| 44 | 
 | 
| 45 | https://twitter.com/gvanrossum/status/995862193609551872
 | 
| 46 | 
 | 
| 47 | "Amazing. A bash implementation in Python, by my ex-coworker (at Google) Andy
 | 
| 48 | Chu"
 | 
| 49 | 
 | 
| 50 | A few years ago he introduced me to 2 compiler engineers working at Dropbox,
 | 
| 51 | who may be good candidates for the job. However they are highly employed and
 | 
| 52 | would need to be compensated.
 | 
| 53 | 
 | 
| 54 | 
 | 
| 55 | 
 | 
| 56 | 
 | 
| 57 | 
 | 
| 58 | 
 | 
| 59 | d. Value to Biomedical Users
 | 
| 60 | Described the expected value the proposed work to the biomedical research community (maximum of 250 words)
 | 
| 61 | 
 | 
| 62 | If batch computation on Unix systems is a bottleneck in your lab's "scientific
 | 
| 63 | discovery loop", then a better Unix shell will make you more productive! You
 | 
| 64 | can run more experiments with less staff.
 | 
| 65 | 
 | 
| 66 | Oil treats Unix shell like a real programming language, rather than a mystery
 | 
| 67 | handed off from one researcher to the next.
 | 
| 68 | 
 | 
| 69 | Moreover, the software that underlies published experiments is heterogeneous: a
 | 
| 70 | mix of programs written in different languages, at different times, by
 | 
| 71 | different people.
 | 
| 72 | 
 | 
| 73 | The Unix shell glues it all together and provides an interactive interface.
 | 
| 74 | It's also a powerful interface for using remote computers.
 | 
| 75 | 
 | 
| 76 | But shell is showing its age and has been neglected by industry and academia.
 | 
| 77 | It has fundamental flaws like a lack of robust error handling, which lead to
 | 
| 78 | productivity loss, expensive training, and even erroneous scientific results.
 | 
| 79 | 
 | 
| 80 | Oil fixes these problems, and adds much-needed features that will be familiar
 | 
| 81 | to Python, JavaScript, and R users.
 | 
| 82 | 
 | 
| 83 | Four Features That Justify a New Unix Shell:
 | 
| 84 | http://www.oilshell.org/blog/2020/10/osh-features.html
 | 
| 85 | 
 | 
| 86 | A Tour of the Oil Language:
 | 
| 87 | https://www.oilshell.org/release/latest/doc/oil-language-tour.html
 | 
| 88 | 
 | 
| 89 | ----
 | 
| 90 | 
 | 
| 91 | Similar sentiments from a third party at https://datacarpentry.org/2015-11-04-ACUNS/shell-intro/
 | 
| 92 | 
 | 
| 93 | - For most bioinformatics tools, you have to use the shell. There is no
 | 
| 94 | 	graphical interface. If you want to work in metagenomics or genomics you're
 | 
| 95 |   going to need to use the shell.
 | 
| 96 | - The shell gives you power ... When you need to do things tens to hundreds of
 | 
| 97 |   times, knowing how to use the shell is transformative.
 | 
| 98 | - To use remote computers or cloud computing, you need to use the shell.
 | 
| 99 | 
 | 
| 100 | 
 | 
| 101 | 
 | 
| 102 | f. Landscape Analysis
 | 
| 103 | 
 | 
| 104 | Briefly describe the other software tools (either proprietary or open source)
 | 
| 105 | that the audience for this proposal primarily uses. How do the software
 | 
| 106 | project(s) in this proposal compare to these other tools in terms of user base
 | 
| 107 | size, usage, and maturity? How do existing tools and the project(s) in this
 | 
| 108 | proposal interact? (maximum of 250 words)
 | 
| 109 | 
 | 
| 110 | 
 | 
| 111 | I made a list of alternative shells:
 | 
| 112 | https://github.com/oilshell/oil/wiki/Alternative-Shells
 | 
| 113 | 
 | 
| 114 | Oil is the ONLY shell that is compatible with bash.  This effort took years,
 | 
| 115 | and the work is largely DONE, and documented extensively on the blog.  It runs
 | 
| 116 | thousands of lines of unmodified bash scripts.
 | 
| 117 | 
 | 
| 118 | Compatibility is important because users (including scientific users) don't
 | 
| 119 | have time to rewrite working shell scripts in a different language.  It's
 | 
| 120 | expensive, just as it's expensive to rewrite C code in another language.
 | 
| 121 | 
 | 
| 122 | But it's easy to run existing code under a new shell, and desirable if it
 | 
| 123 | provides better error handling, debugging, and new features.
 | 
| 124 | 
 | 
| 125 | ------
 | 
| 126 | 
 | 
| 127 | Scientific workflow languages like CWL, WDL, and Snakemake are increasingly
 | 
| 128 | popular [1].  However, they generally wrap Unix shell rather than replace it.
 | 
| 129 | So shell is complementary to these higher level tools.
 | 
| 130 | 
 | 
| 131 | There are also many such languages, and each one may be especially suited for a
 | 
| 132 | particular HPC problem domain.
 | 
| 133 | 
 | 
| 134 | In contrast, Unix shell is ubiquitous in all scientific computing domains, in
 | 
| 135 | both academia and industry.  For example, here are some organizations that are
 | 
| 136 | teaching shell (found through Google):
 | 
| 137 | 
 | 
| 138 | https://curriculumfellows.hms.harvard.edu/classes/introduction-command-line-interface-shell-bash-unix-linux
 | 
| 139 | 
 | 
| 140 | http://chemlabs.princeton.edu/researchcomputing/wp-content/uploads/sites/21/2018/09/hpc-getting-started-chem-workshop.pdf
 | 
| 141 | 
 | 
| 142 | https://bioinformatics.uconn.edu/unix-basics/#
 | 
| 143 | 
 | 
| 144 | https://www.melbournebioinformatics.org.au/tutorials/tutorials/unix/unix/
 | 
| 145 | 
 | 
| 146 | http://williamslab.bscb.cornell.edu/?page_id=235
 | 
| 147 | 
 | 
| 148 | Shell is also widely used in machine learning.  It has the same flavor of
 | 
| 149 | gluing together disparate data sets and tools that you find in the natural
 | 
| 150 | sciences.
 | 
| 151 | 
 | 
| 152 | -----
 | 
| 153 | 
 | 
| 154 | [1] "A review of bioinformatic pipeline frameworks" https://academic.oup.com/bib/article/18/3/530/2562749
 | 
| 155 | 
 |