Pattern Matching

5.3.5 Pattern Matching

In many languages, exchanging the values of two variables is an ugly if simple chore requiring resort to a temporary variable. In C, for example, we might have something like

    {   int i = 12;
        int j = 13;
        ...
        /* Need to swap i with j now: */
        {   int temp = i;
            i = j;
            j = temp;
        }
        ...
    }

A few languages have a special hack which lets you do something like that in one line. Mythryl has something which looks like one of those hacks:

    linux$ my

    eval:  i = 12;
    eval:  j = 13;

    eval:  my (j,i) = (i,j);

    eval:  i;
    13

    eval:  j;
    12

But in Mythryl, this is not a special-case hack, but rather a specific application of a pervasive facility known as pattern matching.

Consider the typical Perl subroutine prologue, which looks something like this:

    sub mumble {
        local( $arg1, $arg2, $arg3 ) = @_;
        ...
    }

The local statement above is unpacking the anonymous vector @_ into three local variables $arg1, $arg2, $arg3. It is doing three assignments in one line. This notation is admirably economical!

Mythryl allows similar parallel assignments in a very general and flexible way.

Here is an example of unpacking a three-slot tuple into three variables:

    linux$ my
    eval:  x = (1,2,3);

    eval:  my (a,b,c) = x;

    eval:  a;
    1

    eval:  b;
    2

    eval:  c;
    3

Here is a matching example of unpacking a three-field record into three variables:

    eval:  x = { name => "John Doe", height => 2.0, weight => 100.0 };

    eval:  my { name => a, height => b, weight => c } = x;

    eval:  a;
    "John Doe"

    eval:  b;
    2.0

    eval:  c;
    100.0

In practice, one frequently unpacks the record into variables with the same names as the record fields:

    eval:  x = { name => "John Doe", height => 2.0, weight => 100.0 };

    eval:  my { name => name, height => height, weight => weight } = x;

    eval:  name;
    "John Doe"

    eval:  height;
    2.0

    eval:  weight;
    100.0

In cases like this, the redundancy of an expression like

    my { name => name, height => height, weight => weight } = x;

can quickly become annoying, so Mythryl allows one to simply drop the variable name in such cases; if it is not given, Mythryl assumes it is the same as the field name:

    eval:  x = { name => "John Doe", height => 2.0, weight => 100.0 };

    eval:  my { name, height, weight } = x;

    eval:  name;
    "John Doe"

    eval:  height;
    2.0

    eval:  weight;
    100.0

This code idiom is used pervasively in Mythryl code.

Similar comments apply in the reverse direction, when constructing records. It is very common to accumulate values one by one in local variables and then construct a record when all values are in hand:

    eval:  a = "John Doe";
    eval:  b = 2.0;
    eval:  c = 100.0;

    eval:  x = { name => a, height => b, weight => c };

    { height=2.0, name="John Doe", weight=100.0 }

Once again, it is very common to accumulate the values in variables with the same names as the field-names in the record:

    eval:  name = "John Doe";
    eval:  height = 2.0;
    eval:  weight = 100.0;

    eval:  x = { name => name, height => height, weight => weight };
    { height=2.0, name="John Doe", weight=100.0 }

Once again, repeating every identifier twice during construction of the record quickly becomes tedious, so Mythryl allows dropping the variable name in such cases:

    eval:  name = "John Doe";
    eval:  height = 2.0;
    eval:  weight = 100.0;

    eval:  x = { name, height, weight };
    { height=2.0, name="John Doe", weight=100.0 }

This is another idiom used pervasively in Mythryl code.

Let us return to the topic of pattern matching.

One frequently wishes to extract only a subset of the values in a tuple. In this case one uses underbar wildcards in the slots which are not of interest:

    eval:  x = (1,2,3,4,5,6);
    eval:  my (a,_,c,_,d,_) = x;

    eval:  a;
    1

    eval:  c;
    3

    eval:  d;
    5

This is yet another pervasive idiom.

To make things more interesting, Mythryl also allows tuples and records to be nested arbitrarily in patterns:

    eval:  x = ( 1, (2,3), { name=>"John Doe", height => 2.0, weight => 100.0 } );

    eval:  my (_, (_, a), { name,  ... } ) = x;

    eval:  a;
    3

    eval:  name;
    "John Doe"

Here we have a tuple containing another tuple plus a record; we have extracted one value each from the nested tuple and record, using a ... elipsis to represent the record fields in which we have no interest and underbar wildcards to represent the tuple slots in which we have no interest.

Pattern matching pops up in Mythryl in all sorts of spots in which you might not at first expect it. For example, the rules in case statements allow pattern-matching:

    #!/usr/bin/mythryl

    x = (1, (2,3));

    case x
    (1, (b, c)) => printf "one-tuple carrying %d %d\n" b c;
    (a, (b, c)) => printf "%d-tuple carrying %d %d\n" a b c;
    esac;

When run, this produces

    linux$ ./my-script
    one-tuple carrying 2 3
    linux$

When interpreting such case statements it is important to remember that they are logically evaluated by the compiler top to bottom, selecting the first one which matches. (In practice, the compiler uses sophisticated optimization techniques to speed execution.)

As the patterns used in such rules become more complex, it becomes ever more reassuring that the compiler issues diagnostics for rules which are redundant (can never match) and rulesets which are incomplete (some possible inputs would match no rule).

The case statement pattern matching facility can be used in some interesting and initially non-obvious ways. Suppose for example that one has two Boolean variables and needs to execute different code for all four possible combinations of their values. One could nest multiple if statements, but this is cleaner:

    #!/usr/bin/mythryl

    a = TRUE;
    b = FALSE;

    case (a,b)
    (TRUE, TRUE ) => print "TRUE / TRUE  case\n";
    (TRUE, FALSE) => print "TRUE / FALSE case\n";
    (FALSE,TRUE ) => print "FALSE/ TRUE  case\n";
    (FALSE,FALSE) => print "FALSE/ FALSE case\n";
    esac;

    linux$ ./my-script
    TRUE / FALSE case
    linux$

In this particular case the benefit is small, but as the number of states to be enumerated grows larger, so does the improvement in code readability and maintainability relative to using a rats-nest of if statements.

Comments and suggestions to: bugs@mythryl.org