Side Effects

5.3.10 Side Effects

It is time to broach the vexed subject of side-effects.

By side-effects in this context we mean essentially changing some value in memory in such a way that if code which had previously examined its value were to re-examine it, it would find that value changed.

Side-effects were not a major issue when the C programming language was designed. Computers were slow, memories were small (often less then 64K of RAM), and consequently programs were small and simple.

Today it is common for commodity desktop computers to have gigabytes of memory and multiple cores executing instructions in parallel out of that memory. Hundreds of millions of lines of code may be executing in-memory at the same time. On high-end number-crunching computers there may be tens of thousands of cores.

In this regime side-effects are a major issue.

From a hardware design point of view, every side-effect is now in fact a broadcast operation: The results of that memory write may need to be made visible to anything from four to eight cores on a small machine to tens of thousands of cores on a supercomputer. That is an inherently slow and expensive operation. The more side-effects the program creates, the harder it will be to attain good execution speed.

From a software design point of view, in such a context every side-effect is a bug waiting to happen. Side-effects are fertile breeding grounds for a wide variety of bugs ranging from race conditions to stale local copies.

In the contemporary context, thus, there are major advantages to software development approaches which avoid needless use of side-effects.

Some languages make pervasive use of side-effects. In a C program often every other line of code will update a pre-existing record in memory and thus cause a side-effect. Such languages are often called imperative.

Other languages, such as Haskell, completely ban side-effects. Simon Peyton-Jones calls this “wearing the hair shirt”. Writing code completely without side-effects involves a number of severe difficulties — and brings with it a number of great advantages. Such languages are often called pure-functional.

Mythryl belongs to the middle ground of mostly-functional languages. Mythryl does allow side-effects, but typical Mythryl programs use them sparingly. The Mythryl compiler is tuned with the expectation that side-effects will be rare.

Mythryl programs avoid side-effects by doing a lot of copying. Where a C program would update a record field in memory, a Mythryl program will typically just make a new, updated, copy of the record, leaving the original copy untouched. Mythryl makes this very efficient; Mythryl can create records in a fraction of the time needed by C. (Is C faster than Mythryl? It depends what you measure!)

None of the Mythryl programs presented so far in these tutorials use side effects.

In fact, we have not yet presented any Mythryl language constructs which permit the creation of side effects.

Mythryl permits side-effects, but it places strong safeguards upon their use.

For example, all C record fields are read-write, permanently eligible to be modified in place. (This creates hair-raising problems for C compiler writers attempting to optimize code.) But Mythryl record fields are read-only, permanently protected from modification, accidental or deliberate.

In Mythryl, in essence, only reference cells may be modified. All other values are read-only once created. (With the sole exception of mutable vectors.)

This enormously simplifies the compiler writer’s job when implementing optimizations.

More importantly, it makes Mythryl code easier to understand. There is never any question as to whether some function ten million lines away in another thread running on another core is about to update some value being used; with the exception of reference cells (and mutable vectors), such updates are forbidden. This makes large Mythryl programs enormously easier to read and maintain than large C programs.

Mythryl reference cells are used much like C pointers, from a practical point of view:

    #!/usr/bin/mythryl

    pointer = REF 0;

    printf "%d\n" *pointer;

    pointer := 1;

    printf "%d\n" *pointer;

    pointer := 2;

    printf "%d\n" *pointer;

Here one thinks of the REF reference-creating operator much the way one thinks of the C & unary address-taking operator, and of the *pointer dereferencing operator almost exactly the way one thinks of the corresponding C operator.

When run the above code produces

    0
    1
    2

At first blush that may look a lot like this code:

    #!/usr/bin/mythryl

    variable = 0;

    printf "%d\n" variable;

    variable = 1;

    printf "%d\n" variable;

    variable = 2;

    printf "%d\n" variable;

When run, the latter produces exactly the same output as the former.

The critical difference is that in the latter cases the = “assignments” are only assigning convenient names to values. It happens that the same name is being used several times, but nothing is actually being overwritten in any interesting sense. No code running in another thread can ever observe variable changing, and thus no timing bugs are possible as a result of the latter code executing.

In the former case, however, the REF constructor allocates an actual shared cell in memory, and the := operator actually overwrites the contents of this cell. We can store pointers to this cell in tuples and records and pass it around to other functions, which can then observe the changed value:

    #!/usr/bin/mythryl

    cell = REF 0;

    r0 = { name => "0", cell };
    r1 = { name => "1", cell };

    printf "*r0.cell == %d\n" *r0.cell;
    printf "*r1.cell == %d\n" *r1.cell;

    r0.cell := 1;

    printf "*r0.cell == %d\n" *r0.cell;
    printf "*r1.cell == %d\n" *r1.cell;

    r1.cell := 2;

    printf "*r0.cell == %d\n" *r0.cell;
    printf "*r1.cell == %d\n" *r1.cell;

Running this produces:

    linux$ ./my-script
    *r0.cell == 0
    *r1.cell == 0
    *r0.cell == 1
    *r1.cell == 1
    *r0.cell == 2
    *r1.cell == 2
    linux$

Notice that we are reading and writing the same cell through both the r0 and r1 records. This sort of thing can only be done using REF and :=.

In general REF and := should be viewed like goto in C — fundamentally regrettable and vaguely malevolent, but very occasionally exactly the right solution.

For example, REF and := are indispensable when cyclic structures must be created. In our pico-mud example in the previous section, it would be natural to have the Door records point to both of the Room objects they connect as well as having Room objects point to all the Doors entering and leaving them, but we were unable to do that because we had no way of forming cycles in a datastructure.

Here is an updated version which does use cyclic datastructures:

    #!/usr/bin/mythryl

    Room = ROOM { name: String, description: String, doors: Ref(List(Door)) }
    also
    Door = DOOR { name: String, description: String, from: Room, to: Room };

    fun print_room( self as ROOM { name, description, doors } ) = {
        printf "%s room: You see %s\n" name description;
        foreach *doors {.
            my door as DOOR { from, ... } = #d;
            if (from == self)  print_door door;  fi;            # Avoid going into an infinite loop!
        };
    }
    also
    fun print_door( DOOR { name, description, to, ... } ) = {
        printf "%s door: You see %s\n" name description;
        print_room to;
    };

    entryway = ROOM { name => "entryway", description => "a big entryway.", doors => REF [] };
    kitchen  = ROOM { name => "kitchen",  description => "a tidy kitchen.", doors => REF [] };

    door = DOOR { name => "kitchen", description => "a white door.", from => entryway, to => kitchen };

    my ROOM { doors => entryway_doors, ... } = entryway;   entryway_doors := [ door ];
    my ROOM { doors => kitchen_doors,  ... } = kitchen;    kitchen_doors  := [ door ];

    print_room  entryway;

Here we have changed the doors field to hold a reference to a list of doors — which reference we can thus update. This allows us to create both rooms first (with empty door lists), then create the door, with pointers to both rooms, and finally update the room door lists to include the door.

The above example also introduces the as pattern-match syntax

    self as ROOM { name, description, doors }

which allows us to assign a name self to the entire room record even as we also assign names to its name, description and doors individual fields.

When run, the above prints out

    linux$ ./my-script
    entryway room: You see a big entryway.
    kitchen door: You see a white door.
    kitchen room: You see a tidy kitchen.
    linux$

Comments and suggestions to: bugs@mythryl.org