| 1 | ---
 | 
| 2 | default_highlighter: oils-sh
 | 
| 3 | ---
 | 
| 4 | 
 | 
| 5 | Variable Declaration, Mutation, and Scope
 | 
| 6 | =========================================
 | 
| 7 | 
 | 
| 8 | This doc addresses these questions:
 | 
| 9 | 
 | 
| 10 | - How do variables behave in YSH?
 | 
| 11 | - What are some practical guidelines for using them?
 | 
| 12 | 
 | 
| 13 | <div id="toc">
 | 
| 14 | </div>
 | 
| 15 | 
 | 
| 16 | ## YSH Design Goals
 | 
| 17 | 
 | 
| 18 | YSH is a graceful upgrade to shell, and the behavior of variables follows from
 | 
| 19 | that philosophy.
 | 
| 20 | 
 | 
| 21 | - OSH implements shell-compatible behavior.
 | 
| 22 | - YSH enhances shell with **new features** like expressions over typed data,
 | 
| 23 |   which will be familiar to Python and JavaScript programmers.
 | 
| 24 | - It's a **stricter** language.
 | 
| 25 |   - Procs (shell functions) are self-contained and modular.  They're
 | 
| 26 |     understandable by reading their signature.
 | 
| 27 |   - We removed [dynamic scope]($xref:dynamic-scope).  This mechanism isn't
 | 
| 28 |     familiar to most programmers, and may cause accidental mutation (bugs).
 | 
| 29 |   - YSH has variable **declarations** like JavaScript, which can prevent
 | 
| 30 |     trivial bugs.
 | 
| 31 | - Even though YSH is stricter, it should still be convenient to use
 | 
| 32 |   interactively.
 | 
| 33 | 
 | 
| 34 | ## Keywords Are More Consistent and Powerful Than Builtins
 | 
| 35 | 
 | 
| 36 | YSH has 5 keywords affect shell variables.  Unlike shell builtins, they're
 | 
| 37 | statically-parsed, and take dynamically-typed **expressions** on the right.
 | 
| 38 | 
 | 
| 39 | ### Declare With `var` and `const`
 | 
| 40 | 
 | 
| 41 | It looks like JavaScript:
 | 
| 42 | 
 | 
| 43 |     var name = 'Bob'
 | 
| 44 |     const age = (20 + 1) * 2
 | 
| 45 | 
 | 
| 46 |     echo "$name is $age years old"  # Bob is 42 years old
 | 
| 47 | 
 | 
| 48 | Note that `const` is enforced by a dynamic check.  It's meant to be used at the
 | 
| 49 | top level only, not within `proc` or `func`.
 | 
| 50 | 
 | 
| 51 |     const age = 'other'  # Will fail because `readonly` bit is set
 | 
| 52 | 
 | 
| 53 | ### Mutate With `setvar` and `setglobal`
 | 
| 54 | 
 | 
| 55 |     proc p {
 | 
| 56 |       var name = 'Bob'       # declare
 | 
| 57 |       setvar name = 'Alice'  # mutate
 | 
| 58 | 
 | 
| 59 |       setglobal g = 42       # create or mutate a global variable
 | 
| 60 |     }
 | 
| 61 | 
 | 
| 62 | ### "Return" By Mutating a `Place` (advanced)
 | 
| 63 | 
 | 
| 64 | A `Place` is a more principled mechanism that "replaces" shell's dynamic scope.
 | 
| 65 | To use it:
 | 
| 66 | 
 | 
| 67 | 1. Create a place with the `&` prefix operator
 | 
| 68 | 1. Pass the place around as you would any other value.
 | 
| 69 | 1. Assign to the place with its `setValue(x)` method.
 | 
| 70 | 
 | 
| 71 | Example:
 | 
| 72 | 
 | 
| 73 |     proc p (s; out) {  # place is a typed param
 | 
| 74 |       # mutate the place
 | 
| 75 |       call out->setValue("prefix-$s")
 | 
| 76 |     }
 | 
| 77 | 
 | 
| 78 |     var x
 | 
| 79 |     p ('foo', &x)  # pass a place
 | 
| 80 |     echo x=$x  # => x=prefix-foo
 | 
| 81 | 
 | 
| 82 | - *Style guideline*: In some situations, it's better to "return" a value on
 | 
| 83 |   stdout, and use `$(myproc)` to retrieve it.
 | 
| 84 | 
 | 
| 85 | ### Comparison to Shell
 | 
| 86 | 
 | 
| 87 | Shell and [bash]($xref) have grown many mechanisms for "declaring" and mutating
 | 
| 88 | variables:
 | 
| 89 | 
 | 
| 90 | - "bare" assignments like `x=foo`
 | 
| 91 | - **builtins** like `declare`, `local`, and `readonly`
 | 
| 92 | - The `-n` "nameref" flag
 | 
| 93 | 
 | 
| 94 | Examples:
 | 
| 95 | 
 | 
| 96 |     readonly name=World        # no spaces allowed around =
 | 
| 97 |     declare foo="Hello $name"
 | 
| 98 |     foo=$((42 + a[2]))
 | 
| 99 |     declare -n ref=foo         # $foo can be written through $ref
 | 
| 100 | 
 | 
| 101 | These constructs are all discouraged in YSH code.
 | 
| 102 | 
 | 
| 103 | ## Keywords Behave Differently at the Top Level (Like JavaScript)
 | 
| 104 | 
 | 
| 105 | The "top-level" of the interpreter is used in two situations:
 | 
| 106 | 
 | 
| 107 | 1. When using YSH **interactively**.
 | 
| 108 | 2. As the **global** scope of a batch program.
 | 
| 109 | 
 | 
| 110 | Experienced YSH users may notice that `var` and `setvar` behave differently in
 | 
| 111 | the top-level scope vs. `proc` scope.  This is caused by the tension between
 | 
| 112 | the interactive shell and the strictness of YSH.
 | 
| 113 | 
 | 
| 114 | In particular, the `source` builtin is dynamic, so YSH can't know all the names
 | 
| 115 | defined at the top level.
 | 
| 116 | 
 | 
| 117 | For reference, JavaScript's modern `let` keyword has similar behavior.
 | 
| 118 | 
 | 
| 119 | ### Usage Guidelines
 | 
| 120 | 
 | 
| 121 | Before going into detail on keyword behavior, here are some practical
 | 
| 122 | guidelines:
 | 
| 123 | 
 | 
| 124 | - **Interactive** sessions: Use shell's `x=y`, or YSH `setvar`.  You can think
 | 
| 125 |   of `setvar` like Python's assignment operator: it creates or mutates a
 | 
| 126 |   variable.
 | 
| 127 |   - **Short scripts** (~20 lines) can also use this style.
 | 
| 128 | - **Long programs**: Refactor them into composable "functions", i.e. `proc`.
 | 
| 129 |   - First wrap the **whole program** into `proc main { }`.
 | 
| 130 |   - The top level should only have `const` declarations.  (You can use `var`,
 | 
| 131 |     but it has special rules, explained below.)
 | 
| 132 |   - The body of `proc` and `func` should have variables declared with `var`.
 | 
| 133 |   - Inside these code blocks, use `setvar` to mutate **local** variables, and
 | 
| 134 |     `setglobal` to mutate **globals**.
 | 
| 135 | 
 | 
| 136 | That's all you need to remember.  The following sections explain the rationale
 | 
| 137 | for these guidelines.
 | 
| 138 | 
 | 
| 139 | ### The Top-Level Scope Has Only Dynamic Checks
 | 
| 140 | 
 | 
| 141 | The lack of static checks affects the recommended usage for both interactive
 | 
| 142 | sessions and batch scripts.
 | 
| 143 | 
 | 
| 144 | #### Interactive Use: `setvar` only
 | 
| 145 | 
 | 
| 146 | As mentioned, you only need the `setvar` keyword in an interactive shell:
 | 
| 147 | 
 | 
| 148 |     ysh$ setvar x = 42   # create variable 'x'
 | 
| 149 |     ysh$ setvar x = 43   # mutate it
 | 
| 150 | 
 | 
| 151 | Details on top-level behavior:
 | 
| 152 | 
 | 
| 153 | - `var` behaves like `setvar`: It creates or mutates a variable.  In other
 | 
| 154 |   words, a `var` definition can be **redefined** at the top-level.
 | 
| 155 | - A `const` can also redefine a `var`.
 | 
| 156 | - A `var` can't redefine a `const` because there's a **dynamic** check that
 | 
| 157 |   disallows mutation (like shell's `readonly`).
 | 
| 158 | 
 | 
| 159 | #### Batch Use: `const` only
 | 
| 160 | 
 | 
| 161 | It's simpler to use only constants at the top level.
 | 
| 162 | 
 | 
| 163 |     const USER = 'bob'
 | 
| 164 |     const HOST = 'example.com'
 | 
| 165 | 
 | 
| 166 |     proc p {
 | 
| 167 |       ssh $USER@$HOST ls -l
 | 
| 168 |     }
 | 
| 169 | 
 | 
| 170 | This is so you don't have to worry about a `var` being redefined by a statement
 | 
| 171 | like `source mylib.sh`.  A `const` can't be redefined because it can't be
 | 
| 172 | mutated.
 | 
| 173 | 
 | 
| 174 | It may be useful to put mutable globals in a constant dictionary, as it will
 | 
| 175 | prevent them from being redefined:
 | 
| 176 | 
 | 
| 177 |     const G = { mystate = 0 }
 | 
| 178 | 
 | 
| 179 |     proc p {
 | 
| 180 |       setglobal G.mystate = 1
 | 
| 181 |     }
 | 
| 182 | 
 | 
| 183 | ### `proc` and `func` Scope Have Static Checks
 | 
| 184 | 
 | 
| 185 | These YSH code units have additional **static checks** (parse errors):
 | 
| 186 | 
 | 
| 187 | - Every variable must be declared once and only once with `var`.  A duplicate
 | 
| 188 |   declaration is a parse error.
 | 
| 189 | - `setvar` of an undeclared variable is a parse error.
 | 
| 190 | 
 | 
| 191 | ## Procs Don't Use "Dynamic Scope"
 | 
| 192 | 
 | 
| 193 | Procs are designed to be encapsulated and composable like processes.  But the
 | 
| 194 | [dynamic scope]($xref:dynamic-scope) rule that Bourne shell functions use
 | 
| 195 | breaks encapsulation.
 | 
| 196 |   
 | 
| 197 | Dynamic scope means that a function can **read and mutate** the locals of its
 | 
| 198 | caller, its caller's caller, and so forth.  Example:
 | 
| 199 | 
 | 
| 200 |     g() {
 | 
| 201 |       echo "f_var is $f_var"  # g can see f's local variables
 | 
| 202 |     }
 | 
| 203 | 
 | 
| 204 |     f() {
 | 
| 205 |       local f_var=42 g
 | 
| 206 |     }
 | 
| 207 | 
 | 
| 208 |     f
 | 
| 209 | 
 | 
| 210 | YSH code should use `proc` instead.  Inside a proc call, the `dynamic_scope`
 | 
| 211 | option is implicitly disabled (equivalent to `shopt --unset dynamic_scope`).
 | 
| 212 | 
 | 
| 213 | ### Reading Variables
 | 
| 214 | 
 | 
| 215 | This means that adding the `proc` keyword to the definition of `g` changes its
 | 
| 216 | behavior:
 | 
| 217 | 
 | 
| 218 |     proc g() {
 | 
| 219 |       echo "f_var is $f_var"  # Undefined!
 | 
| 220 |     }
 | 
| 221 | 
 | 
| 222 | This affects all kinds of variable references:
 | 
| 223 | 
 | 
| 224 |     proc p {
 | 
| 225 |       echo $foo         # look up foo in command mode
 | 
| 226 |       var y = foo + 42  # look up foo in expression mode
 | 
| 227 |     }
 | 
| 228 | 
 | 
| 229 | As in Python and JavaScript, a local `foo` can *shadow* a global `foo`.  Using
 | 
| 230 | `CAPS` for globals is a common style that avoids confusion.  Remember that
 | 
| 231 | globals should usually be constants in YSH.
 | 
| 232 | 
 | 
| 233 | ### Shell Language Constructs That Write Variables
 | 
| 234 | 
 | 
| 235 | In shell, these language constructs assign to variables using dynamic
 | 
| 236 | scope.  In YSH, they only mutate the **local** scope:
 | 
| 237 | 
 | 
| 238 | - `x=val`
 | 
| 239 |   - And variants `x+=val`, `a[i]=val`, `a[i]+=val`
 | 
| 240 | - `export x=val` and `readonly x=val`
 | 
| 241 | - `${x=default}`
 | 
| 242 | - `mycmd {x}>out` (stores a file descriptor in `$x`)
 | 
| 243 | - `(( x = 42 + y ))`
 | 
| 244 | 
 | 
| 245 | ### Builtins That Write Variables
 | 
| 246 | 
 | 
| 247 | These builtins are also "isolated" inside procs, using local scope:
 | 
| 248 | 
 | 
| 249 | - [read](ref/chap-builtin-cmd.html#read) (`$REPLY`)
 | 
| 250 | - [readarray](ref/chap-builtin-cmd.html#readarray) aka `mapfile`
 | 
| 251 | - [getopts](ref/chap-builtin-cmd.html#getopts) (`$OPTIND`, `$OPTARG`, etc.)
 | 
| 252 | - [printf](ref/chap-builtin-cmd.html#printf) -v
 | 
| 253 | - [unset](ref/chap-osh-assign.html#unset)
 | 
| 254 | 
 | 
| 255 | YSH Builtins:
 | 
| 256 | 
 | 
| 257 | - [compadjust](ref/chap-builtin-cmd.html#compadjust)
 | 
| 258 | - [try](ref/chap-builtin-cmd.html#try) and `_error`
 | 
| 259 | 
 | 
| 260 | <!-- TODO: should YSH builtins always behave the same way?  Isn't that a little
 | 
| 261 | faster? I think read --all is not consistent.  -->
 | 
| 262 | 
 | 
| 263 | ### Reminder: Proc Scope is Flat
 | 
| 264 | 
 | 
| 265 | All local variables in shell functions and procs live in the same scope.  This
 | 
| 266 | includes variables declared in conditional blocks (`if` and `case`) and loops
 | 
| 267 | (`for` and `while`).
 | 
| 268 | 
 | 
| 269 |     proc p {  
 | 
| 270 |       for i in 1 2 3 {
 | 
| 271 |         echo $i
 | 
| 272 |       }
 | 
| 273 |       echo $i  # i is still 3
 | 
| 274 |     }
 | 
| 275 | 
 | 
| 276 | This includes first-class YSH blocks:
 | 
| 277 | 
 | 
| 278 |     proc p {
 | 
| 279 |       var x = 42
 | 
| 280 |       cd /tmp {
 | 
| 281 |         var x = 0  # ERROR: x is already declared
 | 
| 282 |       }
 | 
| 283 |     }
 | 
| 284 | 
 | 
| 285 | ## More Details
 | 
| 286 | 
 | 
| 287 | ### Examples of Place Mutation
 | 
| 288 | 
 | 
| 289 | The expression to the left of `=` is called a **place**.  These are basically
 | 
| 290 | Python or JavaScript expressions, except that you add the `setvar` or
 | 
| 291 | `setglobal` keyword.
 | 
| 292 | 
 | 
| 293 |     setvar x[1] = 2                 # array element
 | 
| 294 |     setvar d['key'] = 3             # dict element
 | 
| 295 |     setvar d.key = 3                # syntactic sugar for the above
 | 
| 296 |     setvar x, y = y, x              # swap
 | 
| 297 | 
 | 
| 298 | ### Bare Assignment
 | 
| 299 | 
 | 
| 300 | [Hay](hay.html) allows `const` declarations without the keyword:
 | 
| 301 | 
 | 
| 302 |     hay define Package
 | 
| 303 | 
 | 
| 304 |     Package cpython {
 | 
| 305 |       version = '3.12'  # like const version = ...
 | 
| 306 |     }
 | 
| 307 | 
 | 
| 308 | ### Temp Bindings
 | 
| 309 | 
 | 
| 310 | Temp bindings precede a simple command:
 | 
| 311 | 
 | 
| 312 |     PYTHONPATH=. mycmd
 | 
| 313 | 
 | 
| 314 | They create a new namespace on the stack where each cell has the `export` flag
 | 
| 315 | set (`declare -x`).
 | 
| 316 | 
 | 
| 317 | In YSH, the lack of dynamic scope means that they can't be read inside a
 | 
| 318 | `proc`.  So they're only useful for setting environment variables, and can be
 | 
| 319 | replaced with:
 | 
| 320 | 
 | 
| 321 |     env PYTHONPATH=. mycmd
 | 
| 322 |     env PYTHONPATH=. $0 myproc  # using the ARGV dispatch pattern
 | 
| 323 | 
 | 
| 324 | ## Appendix A: More on Shell vs. YSH
 | 
| 325 | 
 | 
| 326 | This section may help experienced shell users understand YSH.
 | 
| 327 | 
 | 
| 328 | Shell:
 | 
| 329 | 
 | 
| 330 |     g=G                         # global variable
 | 
| 331 |     readonly c=C                # global constant
 | 
| 332 | 
 | 
| 333 |     myfunc() {
 | 
| 334 |       local x=X                 # local variable
 | 
| 335 |       readonly y=Y              # local constant
 | 
| 336 | 
 | 
| 337 |       x=mutated                 # mutate local
 | 
| 338 |       g=mutated                 # mutate global
 | 
| 339 |       newglobal=G               # create new global
 | 
| 340 | 
 | 
| 341 |       caller_var=mutated        # dynamic scope (YSH doesn't have this)
 | 
| 342 |     }
 | 
| 343 | 
 | 
| 344 | YSH:
 | 
| 345 | 
 | 
| 346 |     var g = 'G'                 # global variable (discouraged)
 | 
| 347 |     const c = 'C'               # global constant
 | 
| 348 | 
 | 
| 349 |     proc myproc {
 | 
| 350 |       var x = 'L'               # local variable
 | 
| 351 | 
 | 
| 352 |       setvar x = 'mutated'      # mutate local
 | 
| 353 |       setglobal g = 'mutated'   # mutate global
 | 
| 354 |       setglobal newglobal = 'G' # create new global
 | 
| 355 |     }
 | 
| 356 | 
 | 
| 357 | ## Appendix B: Problems With Top-Level Scope In Other Languages
 | 
| 358 | 
 | 
| 359 | - Julia 1.5 (August 2020): [The return of "soft scope" in the
 | 
| 360 |   REPL](https://julialang.org/blog/2020/08/julia-1.5-highlights/#the_return_of_soft_scope_in_the_repl).
 | 
| 361 |   - In contrast to Julia, YSH behaves the same in batch mode vs. interactive
 | 
| 362 |     mode, and doesn't print warnings.  However, it behaves differently at the
 | 
| 363 |     top level.  For this reason, we recommend using only `setvar` in
 | 
| 364 |     interactive shells, and only `const` in the global scope of programs.
 | 
| 365 | - Racket: [The Top Level is Hopeless](https://gist.github.com/samth/3083053)
 | 
| 366 |   - From [A Principled Approach to REPL Interpreters](https://2020.splashcon.org/details/splash-2020-Onward-papers/5/A-principled-approach-to-REPL-interpreters)
 | 
| 367 |     (Onward 2020).  Thanks to Michael Greenberg (of Smoosh) for this reference.
 | 
| 368 |   - The behavior of `var` at the top level was partly inspired by this
 | 
| 369 |     paper.  It's consistent with bash's `declare`, and similar to JavaScript's
 | 
| 370 |     `let`.
 | 
| 371 | 
 | 
| 372 | ## Related Documents
 | 
| 373 | 
 | 
| 374 | - [Interpreter State](interpreter-state.html)
 | 
| 375 |   - The shell has a stack of namespaces.
 | 
| 376 |   - Each namespace contains {variable name -> cell} bindings.
 | 
| 377 |   - Cells have a tagged value (string, array, etc.) and 3 flags (readonly,
 | 
| 378 |     export, nameref).
 | 
| 379 | - [Guide to Procs and Funcs](proc-func.html)
 | 
| 380 | 
 |