1 | 2022-04-19
|
2 |
|
3 |
|
4 | c. Proposal Summary/Scope of Work
|
5 | Provide a short summary of the work being proposed (maximum of 500 words)
|
6 |
|
7 | The Unix shell is a central user interface and glue language in all kinds of
|
8 | scientific computing, particularly in bioinformatics.
|
9 |
|
10 | Oil is a new open source Unix shell. It's meant to be our upgrade path from GNU
|
11 | bash, the most popular shell in the world. Home page: https://www.oilshell.org/
|
12 |
|
13 | It runs your existing scripts, and allows you to upgrade them to the new Oil
|
14 | language, which is designed to be familiar to Python and JavaScript users.
|
15 |
|
16 | In the last few years, we've released a correct but slow implementation dozens
|
17 | of times, and gotten regular feedback from users.
|
18 |
|
19 | Now we need a COMPILER ENGINEER to finish semi-automatically translating the
|
20 | code to fast C++. This plot is a quick way to see this:
|
21 | https://www.oilshell.org/blog/2022/03/spec-test-history.png
|
22 |
|
23 | Blog post with this plot: https://www.oilshell.org/blog/2022/03/middle-out.html
|
24 |
|
25 | Roughly speaking, we'll have a competitive replacement for / upgrade to bash
|
26 | when the red line meets the blue line!
|
27 |
|
28 | So the work is already more than half done, and I would consider it low risk /
|
29 | high reward. Addressing the speed issue will allow us to aggressively add new
|
30 | features and polish the documentation.
|
31 |
|
32 | Our FAQ has over 178K views, having been featured in many places like Hacker
|
33 | News: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
|
34 |
|
35 | -----
|
36 |
|
37 | I've drafted the job requirements here:
|
38 | https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job
|
39 |
|
40 | I will use my professional network (having worked at Google and EA) to find the
|
41 | compiler engineer, who will be skilled in compilers, C++ and Python.
|
42 |
|
43 | Python creator Guido van Rossum knows about Oil:
|
44 |
|
45 | https://twitter.com/gvanrossum/status/995862193609551872
|
46 |
|
47 | "Amazing. A bash implementation in Python, by my ex-coworker (at Google) Andy
|
48 | Chu"
|
49 |
|
50 | A few years ago he introduced me to 2 compiler engineers working at Dropbox,
|
51 | who may be good candidates for the job. However they are highly employed and
|
52 | would need to be compensated.
|
53 |
|
54 |
|
55 |
|
56 |
|
57 |
|
58 |
|
59 | d. Value to Biomedical Users
|
60 | Described the expected value the proposed work to the biomedical research community (maximum of 250 words)
|
61 |
|
62 | If batch computation on Unix systems is a bottleneck in your lab's "scientific
|
63 | discovery loop", then a better Unix shell will make you more productive! You
|
64 | can run more experiments with less staff.
|
65 |
|
66 | Oil treats Unix shell like a real programming language, rather than a mystery
|
67 | handed off from one researcher to the next.
|
68 |
|
69 | Moreover, the software that underlies published experiments is heterogeneous: a
|
70 | mix of programs written in different languages, at different times, by
|
71 | different people.
|
72 |
|
73 | The Unix shell glues it all together and provides an interactive interface.
|
74 | It's also a powerful interface for using remote computers.
|
75 |
|
76 | But shell is showing its age and has been neglected by industry and academia.
|
77 | It has fundamental flaws like a lack of robust error handling, which lead to
|
78 | productivity loss, expensive training, and even erroneous scientific results.
|
79 |
|
80 | Oil fixes these problems, and adds much-needed features that will be familiar
|
81 | to Python, JavaScript, and R users.
|
82 |
|
83 | Four Features That Justify a New Unix Shell:
|
84 | http://www.oilshell.org/blog/2020/10/osh-features.html
|
85 |
|
86 | A Tour of the Oil Language:
|
87 | https://www.oilshell.org/release/latest/doc/oil-language-tour.html
|
88 |
|
89 | ----
|
90 |
|
91 | Similar sentiments from a third party at https://datacarpentry.org/2015-11-04-ACUNS/shell-intro/
|
92 |
|
93 | - For most bioinformatics tools, you have to use the shell. There is no
|
94 | graphical interface. If you want to work in metagenomics or genomics you're
|
95 | going to need to use the shell.
|
96 | - The shell gives you power ... When you need to do things tens to hundreds of
|
97 | times, knowing how to use the shell is transformative.
|
98 | - To use remote computers or cloud computing, you need to use the shell.
|
99 |
|
100 |
|
101 |
|
102 | f. Landscape Analysis
|
103 |
|
104 | Briefly describe the other software tools (either proprietary or open source)
|
105 | that the audience for this proposal primarily uses. How do the software
|
106 | project(s) in this proposal compare to these other tools in terms of user base
|
107 | size, usage, and maturity? How do existing tools and the project(s) in this
|
108 | proposal interact? (maximum of 250 words)
|
109 |
|
110 |
|
111 | I made a list of alternative shells:
|
112 | https://github.com/oilshell/oil/wiki/Alternative-Shells
|
113 |
|
114 | Oil is the ONLY shell that is compatible with bash. This effort took years,
|
115 | and the work is largely DONE, and documented extensively on the blog. It runs
|
116 | thousands of lines of unmodified bash scripts.
|
117 |
|
118 | Compatibility is important because users (including scientific users) don't
|
119 | have time to rewrite working shell scripts in a different language. It's
|
120 | expensive, just as it's expensive to rewrite C code in another language.
|
121 |
|
122 | But it's easy to run existing code under a new shell, and desirable if it
|
123 | provides better error handling, debugging, and new features.
|
124 |
|
125 | ------
|
126 |
|
127 | Scientific workflow languages like CWL, WDL, and Snakemake are increasingly
|
128 | popular [1]. However, they generally wrap Unix shell rather than replace it.
|
129 | So shell is complementary to these higher level tools.
|
130 |
|
131 | There are also many such languages, and each one may be especially suited for a
|
132 | particular HPC problem domain.
|
133 |
|
134 | In contrast, Unix shell is ubiquitous in all scientific computing domains, in
|
135 | both academia and industry. For example, here are some organizations that are
|
136 | teaching shell (found through Google):
|
137 |
|
138 | https://curriculumfellows.hms.harvard.edu/classes/introduction-command-line-interface-shell-bash-unix-linux
|
139 |
|
140 | http://chemlabs.princeton.edu/researchcomputing/wp-content/uploads/sites/21/2018/09/hpc-getting-started-chem-workshop.pdf
|
141 |
|
142 | https://bioinformatics.uconn.edu/unix-basics/#
|
143 |
|
144 | https://www.melbournebioinformatics.org.au/tutorials/tutorials/unix/unix/
|
145 |
|
146 | http://williamslab.bscb.cornell.edu/?page_id=235
|
147 |
|
148 | Shell is also widely used in machine learning. It has the same flavor of
|
149 | gluing together disparate data sets and tools that you find in the natural
|
150 | sciences.
|
151 |
|
152 | -----
|
153 |
|
154 | [1] "A review of bioinformatic pipeline frameworks" https://academic.oup.com/bib/article/18/3/530/2562749
|
155 |
|