| 1 | --- | 
| 2 | default_highlighter: oils-sh | 
| 3 | --- | 
| 4 |  | 
| 5 | Variable Declaration, Mutation, and Scope | 
| 6 | ========================================= | 
| 7 |  | 
| 8 | This doc addresses these questions: | 
| 9 |  | 
| 10 | - How do variables behave in YSH? | 
| 11 | - What are some practical guidelines for using them? | 
| 12 |  | 
| 13 | <div id="toc"> | 
| 14 | </div> | 
| 15 |  | 
| 16 | ## YSH Design Goals | 
| 17 |  | 
| 18 | YSH is a graceful upgrade to shell, and the behavior of variables follows from | 
| 19 | that philosophy. | 
| 20 |  | 
| 21 | - OSH implements shell-compatible behavior. | 
| 22 | - YSH enhances shell with **new features** like expressions over typed data, | 
| 23 | which will be familiar to Python and JavaScript programmers. | 
| 24 | - It's a **stricter** language. | 
| 25 | - Procs (shell functions) are self-contained and modular.  They're | 
| 26 | understandable by reading their signature. | 
| 27 | - We removed [dynamic scope]($xref:dynamic-scope).  This mechanism isn't | 
| 28 | familiar to most programmers, and may cause accidental mutation (bugs). | 
| 29 | - YSH has variable **declarations** like JavaScript, which can prevent | 
| 30 | trivial bugs. | 
| 31 | - Even though YSH is stricter, it should still be convenient to use | 
| 32 | interactively. | 
| 33 |  | 
| 34 | ## Keywords Are More Consistent and Powerful Than Builtins | 
| 35 |  | 
| 36 | YSH has 5 keywords affect shell variables.  Unlike shell builtins, they're | 
| 37 | statically-parsed, and take dynamically-typed **expressions** on the right. | 
| 38 |  | 
| 39 | ### Declare With `var` and `const` | 
| 40 |  | 
| 41 | It looks like JavaScript: | 
| 42 |  | 
| 43 | var name = 'Bob' | 
| 44 | const age = (20 + 1) * 2 | 
| 45 |  | 
| 46 | echo "$name is $age years old"  # Bob is 42 years old | 
| 47 |  | 
| 48 | Note that `const` is enforced by a dynamic check.  It's meant to be used at the | 
| 49 | top level only, not within `proc` or `func`. | 
| 50 |  | 
| 51 | const age = 'other'  # Will fail because `readonly` bit is set | 
| 52 |  | 
| 53 | ### Mutate With `setvar` and `setglobal` | 
| 54 |  | 
| 55 | proc p { | 
| 56 | var name = 'Bob'       # declare | 
| 57 | setvar name = 'Alice'  # mutate | 
| 58 |  | 
| 59 | setglobal g = 42       # create or mutate a global variable | 
| 60 | } | 
| 61 |  | 
| 62 | ### "Return" By Mutating a `Place` (advanced) | 
| 63 |  | 
| 64 | A `Place` is a more principled mechanism that "replaces" shell's dynamic scope. | 
| 65 | To use it: | 
| 66 |  | 
| 67 | 1. Create a place with the `&` prefix operator | 
| 68 | 1. Pass the place around as you would any other value. | 
| 69 | 1. Assign to the place with its `setValue(x)` method. | 
| 70 |  | 
| 71 | Example: | 
| 72 |  | 
| 73 | proc p (s; out) {  # place is a typed param | 
| 74 | # mutate the place | 
| 75 | call out->setValue("prefix-$s") | 
| 76 | } | 
| 77 |  | 
| 78 | var x | 
| 79 | p ('foo', &x)  # pass a place | 
| 80 | echo x=$x  # => x=prefix-foo | 
| 81 |  | 
| 82 | - *Style guideline*: In some situations, it's better to "return" a value on | 
| 83 | stdout, and use `$(myproc)` to retrieve it. | 
| 84 |  | 
| 85 | ### Comparison to Shell | 
| 86 |  | 
| 87 | Shell and [bash]($xref) have grown many mechanisms for "declaring" and mutating | 
| 88 | variables: | 
| 89 |  | 
| 90 | - "bare" assignments like `x=foo` | 
| 91 | - **builtins** like `declare`, `local`, and `readonly` | 
| 92 | - The `-n` "nameref" flag | 
| 93 |  | 
| 94 | Examples: | 
| 95 |  | 
| 96 | readonly name=World        # no spaces allowed around = | 
| 97 | declare foo="Hello $name" | 
| 98 | foo=$((42 + a[2])) | 
| 99 | declare -n ref=foo         # $foo can be written through $ref | 
| 100 |  | 
| 101 | These constructs are all discouraged in YSH code. | 
| 102 |  | 
| 103 | ## Keywords Behave Differently at the Top Level (Like JavaScript) | 
| 104 |  | 
| 105 | The "top-level" of the interpreter is used in two situations: | 
| 106 |  | 
| 107 | 1. When using YSH **interactively**. | 
| 108 | 2. As the **global** scope of a batch program. | 
| 109 |  | 
| 110 | Experienced YSH users may notice that `var` and `setvar` behave differently in | 
| 111 | the top-level scope vs. `proc` scope.  This is caused by the tension between | 
| 112 | the interactive shell and the strictness of YSH. | 
| 113 |  | 
| 114 | In particular, the `source` builtin is dynamic, so YSH can't know all the names | 
| 115 | defined at the top level. | 
| 116 |  | 
| 117 | For reference, JavaScript's modern `let` keyword has similar behavior. | 
| 118 |  | 
| 119 | ### Usage Guidelines | 
| 120 |  | 
| 121 | Before going into detail on keyword behavior, here are some practical | 
| 122 | guidelines: | 
| 123 |  | 
| 124 | - **Interactive** sessions: Use shell's `x=y`, or YSH `setvar`.  You can think | 
| 125 | of `setvar` like Python's assignment operator: it creates or mutates a | 
| 126 | variable. | 
| 127 | - **Short scripts** (~20 lines) can also use this style. | 
| 128 | - **Long programs**: Refactor them into composable "functions", i.e. `proc`. | 
| 129 | - First wrap the **whole program** into `proc main { }`. | 
| 130 | - The top level should only have `const` declarations.  (You can use `var`, | 
| 131 | but it has special rules, explained below.) | 
| 132 | - The body of `proc` and `func` should have variables declared with `var`. | 
| 133 | - Inside these code blocks, use `setvar` to mutate **local** variables, and | 
| 134 | `setglobal` to mutate **globals**. | 
| 135 |  | 
| 136 | That's all you need to remember.  The following sections explain the rationale | 
| 137 | for these guidelines. | 
| 138 |  | 
| 139 | ### The Top-Level Scope Has Only Dynamic Checks | 
| 140 |  | 
| 141 | The lack of static checks affects the recommended usage for both interactive | 
| 142 | sessions and batch scripts. | 
| 143 |  | 
| 144 | #### Interactive Use: `setvar` only | 
| 145 |  | 
| 146 | As mentioned, you only need the `setvar` keyword in an interactive shell: | 
| 147 |  | 
| 148 | ysh$ setvar x = 42   # create variable 'x' | 
| 149 | ysh$ setvar x = 43   # mutate it | 
| 150 |  | 
| 151 | Details on top-level behavior: | 
| 152 |  | 
| 153 | - `var` behaves like `setvar`: It creates or mutates a variable.  In other | 
| 154 | words, a `var` definition can be **redefined** at the top-level. | 
| 155 | - A `const` can also redefine a `var`. | 
| 156 | - A `var` can't redefine a `const` because there's a **dynamic** check that | 
| 157 | disallows mutation (like shell's `readonly`). | 
| 158 |  | 
| 159 | #### Batch Use: `const` only | 
| 160 |  | 
| 161 | It's simpler to use only constants at the top level. | 
| 162 |  | 
| 163 | const USER = 'bob' | 
| 164 | const HOST = 'example.com' | 
| 165 |  | 
| 166 | proc p { | 
| 167 | ssh $USER@$HOST ls -l | 
| 168 | } | 
| 169 |  | 
| 170 | This is so you don't have to worry about a `var` being redefined by a statement | 
| 171 | like `source mylib.sh`.  A `const` can't be redefined because it can't be | 
| 172 | mutated. | 
| 173 |  | 
| 174 | It may be useful to put mutable globals in a constant dictionary, as it will | 
| 175 | prevent them from being redefined: | 
| 176 |  | 
| 177 | const G = { mystate = 0 } | 
| 178 |  | 
| 179 | proc p { | 
| 180 | setglobal G.mystate = 1 | 
| 181 | } | 
| 182 |  | 
| 183 | ### `proc` and `func` Scope Have Static Checks | 
| 184 |  | 
| 185 | These YSH code units have additional **static checks** (parse errors): | 
| 186 |  | 
| 187 | - Every variable must be declared once and only once with `var`.  A duplicate | 
| 188 | declaration is a parse error. | 
| 189 | - `setvar` of an undeclared variable is a parse error. | 
| 190 |  | 
| 191 | ## Procs Don't Use "Dynamic Scope" | 
| 192 |  | 
| 193 | Procs are designed to be encapsulated and composable like processes.  But the | 
| 194 | [dynamic scope]($xref:dynamic-scope) rule that Bourne shell functions use | 
| 195 | breaks encapsulation. | 
| 196 |  | 
| 197 | Dynamic scope means that a function can **read and mutate** the locals of its | 
| 198 | caller, its caller's caller, and so forth.  Example: | 
| 199 |  | 
| 200 | g() { | 
| 201 | echo "f_var is $f_var"  # g can see f's local variables | 
| 202 | } | 
| 203 |  | 
| 204 | f() { | 
| 205 | local f_var=42 g | 
| 206 | } | 
| 207 |  | 
| 208 | f | 
| 209 |  | 
| 210 | YSH code should use `proc` instead.  Inside a proc call, the `dynamic_scope` | 
| 211 | option is implicitly disabled (equivalent to `shopt --unset dynamic_scope`). | 
| 212 |  | 
| 213 | ### Reading Variables | 
| 214 |  | 
| 215 | This means that adding the `proc` keyword to the definition of `g` changes its | 
| 216 | behavior: | 
| 217 |  | 
| 218 | proc g() { | 
| 219 | echo "f_var is $f_var"  # Undefined! | 
| 220 | } | 
| 221 |  | 
| 222 | This affects all kinds of variable references: | 
| 223 |  | 
| 224 | proc p { | 
| 225 | echo $foo         # look up foo in command mode | 
| 226 | var y = foo + 42  # look up foo in expression mode | 
| 227 | } | 
| 228 |  | 
| 229 | As in Python and JavaScript, a local `foo` can *shadow* a global `foo`.  Using | 
| 230 | `CAPS` for globals is a common style that avoids confusion.  Remember that | 
| 231 | globals should usually be constants in YSH. | 
| 232 |  | 
| 233 | ### Shell Language Constructs That Write Variables | 
| 234 |  | 
| 235 | In shell, these language constructs assign to variables using dynamic | 
| 236 | scope.  In YSH, they only mutate the **local** scope: | 
| 237 |  | 
| 238 | - `x=val` | 
| 239 | - And variants `x+=val`, `a[i]=val`, `a[i]+=val` | 
| 240 | - `export x=val` and `readonly x=val` | 
| 241 | - `${x=default}` | 
| 242 | - `mycmd {x}>out` (stores a file descriptor in `$x`) | 
| 243 | - `(( x = 42 + y ))` | 
| 244 |  | 
| 245 | ### Builtins That Write Variables | 
| 246 |  | 
| 247 | These builtins are also "isolated" inside procs, using local scope: | 
| 248 |  | 
| 249 | - [read](ref/chap-builtin-cmd.html#read) (`$REPLY`) | 
| 250 | - [readarray](ref/chap-builtin-cmd.html#readarray) aka `mapfile` | 
| 251 | - [getopts](ref/chap-builtin-cmd.html#getopts) (`$OPTIND`, `$OPTARG`, etc.) | 
| 252 | - [printf](ref/chap-builtin-cmd.html#printf) -v | 
| 253 | - [unset](ref/chap-osh-assign.html#unset) | 
| 254 |  | 
| 255 | YSH Builtins: | 
| 256 |  | 
| 257 | - [compadjust](ref/chap-builtin-cmd.html#compadjust) | 
| 258 | - [try](ref/chap-builtin-cmd.html#try) and `_error` | 
| 259 |  | 
| 260 | <!-- TODO: should YSH builtins always behave the same way?  Isn't that a little | 
| 261 | faster? I think read --all is not consistent.  --> | 
| 262 |  | 
| 263 | ### Reminder: Proc Scope is Flat | 
| 264 |  | 
| 265 | All local variables in shell functions and procs live in the same scope.  This | 
| 266 | includes variables declared in conditional blocks (`if` and `case`) and loops | 
| 267 | (`for` and `while`). | 
| 268 |  | 
| 269 | proc p { | 
| 270 | for i in 1 2 3 { | 
| 271 | echo $i | 
| 272 | } | 
| 273 | echo $i  # i is still 3 | 
| 274 | } | 
| 275 |  | 
| 276 | This includes first-class YSH blocks: | 
| 277 |  | 
| 278 | proc p { | 
| 279 | var x = 42 | 
| 280 | cd /tmp { | 
| 281 | var x = 0  # ERROR: x is already declared | 
| 282 | } | 
| 283 | } | 
| 284 |  | 
| 285 | ## More Details | 
| 286 |  | 
| 287 | ### Examples of Place Mutation | 
| 288 |  | 
| 289 | The expression to the left of `=` is called a **place**.  These are basically | 
| 290 | Python or JavaScript expressions, except that you add the `setvar` or | 
| 291 | `setglobal` keyword. | 
| 292 |  | 
| 293 | setvar x[1] = 2                 # array element | 
| 294 | setvar d['key'] = 3             # dict element | 
| 295 | setvar d.key = 3                # syntactic sugar for the above | 
| 296 | setvar x, y = y, x              # swap | 
| 297 |  | 
| 298 | ### Bare Assignment | 
| 299 |  | 
| 300 | [Hay](hay.html) allows `const` declarations without the keyword: | 
| 301 |  | 
| 302 | hay define Package | 
| 303 |  | 
| 304 | Package cpython { | 
| 305 | version = '3.12'  # like const version = ... | 
| 306 | } | 
| 307 |  | 
| 308 | ### Temp Bindings | 
| 309 |  | 
| 310 | Temp bindings precede a simple command: | 
| 311 |  | 
| 312 | PYTHONPATH=. mycmd | 
| 313 |  | 
| 314 | They create a new namespace on the stack where each cell has the `export` flag | 
| 315 | set (`declare -x`). | 
| 316 |  | 
| 317 | In YSH, the lack of dynamic scope means that they can't be read inside a | 
| 318 | `proc`.  So they're only useful for setting environment variables, and can be | 
| 319 | replaced with: | 
| 320 |  | 
| 321 | env PYTHONPATH=. mycmd | 
| 322 | env PYTHONPATH=. $0 myproc  # using the ARGV dispatch pattern | 
| 323 |  | 
| 324 | ## Appendix A: More on Shell vs. YSH | 
| 325 |  | 
| 326 | This section may help experienced shell users understand YSH. | 
| 327 |  | 
| 328 | Shell: | 
| 329 |  | 
| 330 | g=G                         # global variable | 
| 331 | readonly c=C                # global constant | 
| 332 |  | 
| 333 | myfunc() { | 
| 334 | local x=X                 # local variable | 
| 335 | readonly y=Y              # local constant | 
| 336 |  | 
| 337 | x=mutated                 # mutate local | 
| 338 | g=mutated                 # mutate global | 
| 339 | newglobal=G               # create new global | 
| 340 |  | 
| 341 | caller_var=mutated        # dynamic scope (YSH doesn't have this) | 
| 342 | } | 
| 343 |  | 
| 344 | YSH: | 
| 345 |  | 
| 346 | var g = 'G'                 # global variable (discouraged) | 
| 347 | const c = 'C'               # global constant | 
| 348 |  | 
| 349 | proc myproc { | 
| 350 | var x = 'L'               # local variable | 
| 351 |  | 
| 352 | setvar x = 'mutated'      # mutate local | 
| 353 | setglobal g = 'mutated'   # mutate global | 
| 354 | setglobal newglobal = 'G' # create new global | 
| 355 | } | 
| 356 |  | 
| 357 | ## Appendix B: Problems With Top-Level Scope In Other Languages | 
| 358 |  | 
| 359 | - Julia 1.5 (August 2020): [The return of "soft scope" in the | 
| 360 | REPL](https://julialang.org/blog/2020/08/julia-1.5-highlights/#the_return_of_soft_scope_in_the_repl). | 
| 361 | - In contrast to Julia, YSH behaves the same in batch mode vs. interactive | 
| 362 | mode, and doesn't print warnings.  However, it behaves differently at the | 
| 363 | top level.  For this reason, we recommend using only `setvar` in | 
| 364 | interactive shells, and only `const` in the global scope of programs. | 
| 365 | - Racket: [The Top Level is Hopeless](https://gist.github.com/samth/3083053) | 
| 366 | - From [A Principled Approach to REPL Interpreters](https://2020.splashcon.org/details/splash-2020-Onward-papers/5/A-principled-approach-to-REPL-interpreters) | 
| 367 | (Onward 2020).  Thanks to Michael Greenberg (of Smoosh) for this reference. | 
| 368 | - The behavior of `var` at the top level was partly inspired by this | 
| 369 | paper.  It's consistent with bash's `declare`, and similar to JavaScript's | 
| 370 | `let`. | 
| 371 |  | 
| 372 | ## Related Documents | 
| 373 |  | 
| 374 | - [Interpreter State](interpreter-state.html) | 
| 375 | - The shell has a stack of namespaces. | 
| 376 | - Each namespace contains {variable name -> cell} bindings. | 
| 377 | - Cells have a tagged value (string, array, etc.) and 3 flags (readonly, | 
| 378 | export, nameref). | 
| 379 | - [Guide to Procs and Funcs](proc-func.html) | 
| 380 |  |