Mythryl Types: Phantom Types

5.4.19 Mythryl Types: Phantom Types

Material in this tutorial is adapted from Phantom Types and Subtyping by Fluet and Pucalla, 2006

Studies have documented productivity differences of as much as fifty to one between programmers doing the same work side by side. Great programmers achieve these sorts of results not by working harder than mediocre programmers, but by working smarter. Why do laboriously by hand something which the computer can do more reliably and more quickly?

Mythryl provides ample scope to work smarter instead of harder, for those so inclined. One of the great under-appreciated facilities it offers for doing so is its Hindley-Milner typechecker.

This typechecker is based on the unify operation from logic programming made famous by Prolog; it is in essence a poor man’s theorem prover. Consequently when writing Mythryl type declarations you have at your fingertips much of the power of the power of pure Prolog.

By using this power inventively, you can program the Mythryl compiler to catch problems automatically at compile time which you might otherwise wind up having to track down at run time.

In real-world terms, this can mean the difference between being home sound asleep at three AM, or working late in a caffeine stupor against a deadline trying to track down "one last bug". That is part of the difference between working smarter and working harder.

The full power of Hindley-Milner type checking has become apparent only slowly over time; we are still discovering new ways of applying it to solve real-world programming problems. You might be the first to discover yet another!

Many such techniques are based on phantom types.

In the C world, types and values correspond in a simple way: Usually there is a type every value, and values are created for every type.

Hindley-Milner typechecking opens up a new world of possibilities.

For example, it turns out to be possible to implement (emulate) the entire C type system via appropriate Mythryl type declarations. The Mythryl C library interface autogeneration utility c-glue / c-glue-maker, which is the Mythryl port of Matthias Blume’s SML/NJ NLFFI package, does this: You may see part of the code in src/lib/c-glue-lib/c.api.

When doing such advanced Mythryl type hacking, types are often defined with no intent of ever creating any directly corresponding data values. Because of the lack of corresponding data values, such types are often called phantom types.

When we define phantom types we are ceasing to think in terms of runtime code and instead starting to regard the Mythryl type language as a way of programming the Mythryl typechecker to perform tasks of interest to us at compile time. We are programming on a different plane.

Suppose we are implementing a weakly typed interpreter vaguely like Perl or Python. We have a Value type that the interpreter manipulates, which supports integer and boolean values, and operations like print, increment and not which may be performed upon those values.

Our code might well look something like this:

    #!/usr/bin/mythryl

    Value = INT( Int ) | BOOL( Bool) ; 

    fun make_int_value  (i): Value = INT(  i );
    fun make_bool_value (b): Value = BOOL( b );

    fun print (v: Value)
        =
        case v
        INT( i) => printf "%d" i;
        BOOL(b) => printf "%B" b;
        esac;
       
    fun increment (v: Value): Value
        =
        case v
        INT( i) => INT( i + 1 );
        BOOL(b) => raise exception DIE "Cannot increment a Boolean value";
        esac;

    fun not (v: Value): Value
        =
        case v
        INT( i) => raise exception DIE "Cannot 'not' an Int value";
        BOOL(b) => BOOL( bool::not b );
        esac;

    v  = make_int_value( 12 );
    v' = not( v );

Here we have one function each for creating boolean and integer flavors of value, a function which can print both boolean and integer values, a function which can increment integer values, and a function which can not boolean values.

The above code will compile just fine, but at runtime the final line will produce a "Cannot ’not’ an Int value"; runtime error.

This is sub-optimal. As a matter of design praxis, we would prefer to catch such errors at compile time rather than at run time if at all practical. (Suppose the program ran for sixty hours before issuing the runtime error and exiting!)

We do not want to just make our integer and boolean values completely different types, because we want functions like print above to operate indifferently upon either. Yet with them both folded into the same type, the typechecker has no way of flagging the above sort of coding errors.

Phantom types offer a solution:

    #!/usr/bin/mythryl

    Value(X) = INT( Int ) | BOOL( Bool) ; 

    fun make_int_value  (i): Value(Int)  = INT(  i );
    fun make_bool_value (b): Value(Bool) = BOOL( b );

    fun print (v: Value(X))
        =
        case v
        INT( i) => printf "%d" i;
        BOOL(b) => printf "%B" b;
        esac;
       
    fun increment (v: Value(Int)): Value(Int)
        =
        case v
        INT( i) => INT( i + 1 );
        BOOL(b) => raise exception DIE "Cannot increment a Boolean value";
        esac;

    fun not (v: Value(Bool)): Value(Bool)
        =
        case v
        INT( i) => raise exception DIE "Cannot 'not' an Int value";
        BOOL(b) => BOOL( bool::not b );
        esac;

    v  = make_int_value( 12 );
    v' = not( v );

The above code now gives a type error when it compiles.

The crucial difference is that Value now takes a phantom type as argument.

Mythryl does not support subtypes in a true mathematical sense, but the Value phantom type parameter lets us effectively define sub-types Value(Int) and Value(Bool) of our base type Value(X).

This lets us distinguish the return types of make_int_value and make_bool_value by declaring them as respectively Value(Int) and Value(Bool).

By defining print to take a type of Value(X), we allow it to be given arguments of types both Value(Int) and Value(Bool).

But by defining increment and not to accept respectively values of type Value(Int) and Value(Bool), we prime the type-checker to flag any attempt to call either with an inappropriate value.

Note that there is never any data value component corresponding to the phantom type parameter; we added the phantom type without changing the runtime sumtype in any way.

Note also that we could use any two types whatever as the phantom types in the above example, so long as they were typechecker distinct — so long as they did not unify.

For example, the following program is exactly equivalent, despite the replacement of Int and Bool by Vector and String as our phantom witness types:

    #!/usr/bin/mythryl

    Value(X) = INT( Int ) | BOOL( Bool ); 

    fun make_int_value  (i): Value(Vector) = INT(  i );
    fun make_bool_value (b): Value(String) = BOOL( b );

    fun print (v: Value(X))
        =
        case v
        INT( i) => printf "%d" i;
        BOOL(b) => printf "%B" b;
        esac;
       
    fun increment (v: Value(Vector)): Value(Vector)
        =
        case v
        INT( i) => INT( i + 1 );
        BOOL(b) => raise exception DIE "Cannot increment a Boolean value";
        esac;

    fun not (v: Value(String)): Value(String)
        =
        case v
        INT( i) => raise exception DIE "Cannot 'not' an Int value";
        BOOL(b) => BOOL( bool::not b );
        esac;

    v  = make_int_value( 12 );
    v' = not( v );

This emphasizes the irrelevance of phantom types to the runtime behavior of the program; they are purely compiletime book-keeping.

To help settle the idea, here is a similar example with another setting, this time one involving a TCP/IP network socket library.

Here we assume a fictional underlying net package which does all the work irrelevant to our example:

    Socket = UNT( one_word_unt:Unt );

    fun make_udp_socket( address: String ): Socket = {
        net::make_udp_socket address;
    };
    fun make_tcp_socket( address: String ): Socket = {
        net::make_tcp_socket address;
    };

    fun udp_send( socket: Socket,  string: String) = {
        net::udp_send( socket, string );
    }
    fun tcp_send( socket: Socket,  string: String) = {
        net::tcp_send( socket, string );
    }

    fun close_socket( socket: Socket ) = {
        net::close_socket( socket );
    }

Once again, the issue here is that we have two types, upd and tcp sockets, which are neither completely distinct nor identical. We can only call udp_send on udp sockets and only call tcp_send on tcp sockets, but we may call close_socket on either. Given the above implementation, unfortunately, doing a send on the wrong type socket will be detected only at runtime, not at compile type.

Once again, we can solve the problem by adding phantom types to record the needed subtyping information. This time we declare some fresh sumtypes to use as the phantom types, just to improve readability:

    Udp = UDP;          # Used only as phantom type.
    Tcp = TCP;          # Used only as phantom type.

    Socket(X) = UNT( one_word_unt:Unt );

    fun make_udp_socket( address: String ): Socket(UDP) = {
        net::make_udp_socket address;
    };
    fun make_tcp_socket( address: String ): Socket(TCP) = {
        net::make_tcp_socket address;
    };

    fun udp_send( socket: Socket(UDP),  string: String) = {
        net::udp_send( socket, string );
    }
    fun tcp_send( socket: Socket(TCP),  string: String) = {
        net::tcp_send( socket, string );
    }

    fun close_socket( socket: Socket(X) ) = {
        net::close_socket( socket );
    }

Now any attempt to do a send on the wrong type of socket will draw a compile error, but we may still call close_socket on either type of socket.

Suppose now that we wanted to encode a two-level type hierarchy using phantom types: We have a value type which subdivides into floating point and integer, where the integer type in turn subdivides into 32-bit and 64-bit integers. We have a print_value operation which may be applied to any of them, an exp operation which applies only to floats, an increment operation which applies to both integer types, and a negate operation which (for some reason) applies only to 32-bit integers.

We can implement this via phantom types by using two phantom type type variables in our Value definition instead of just one as above:

    #!/usr/bin/mythryl

    My_Int     = MY_INT;
    My_Int1   = MY_INT1;
    My_Int2   = MY_INT2;
    My_Float   = MY_FLOAT;

    Value(X,Y) = INT1(one_word_int::Int) 
               | INT2(two_word_int::Int)
               | FLOAT(float::Float)
               ;

    fun make_int1 (i: one_word_int::Int):           Value(My_Int, My_Int1)
        =
        INT1(i);

    fun make_int2 (i: two_word_int::Int):           Value(My_Int, My_Int2)
        =
        INT2(i);

    fun make_float (f: float::Float):         Value(My_Float, Y)
        =
        FLOAT(f);

    fun print_value( v: Value(X,Y) )
        =
        case  v
        INT1(i) =>   print (one_word_int::to_string i);
        INT2(i) =>   print (two_word_int::to_string i);
        FLOAT(f) =>   print (float::to_string f);
        esac;

    fun increment( v: Value(My_Int,Y) ):      Value(My_Int,Y)
        =
        case  v
        INT1(i) =>   INT1( i + 1 );
        INT2(i) =>   INT2( i + 1 );
        _        =>   raise exception DIE "increment: impossible case";
        esac;

    fun negate( v: Value(My_Int,My_Int1) ):   Value(My_Int,My_Int1)
        =
        case  v
        INT1(i) =>   INT1( one_word_int::neg i );
        _        =>   raise exception DIE "negate: impossible case";
        esac;

    fun exp( v: Value(My_Float,Y) ):          Value(My_Float,Y)
        =
        case  v
        FLOAT(f) =>   FLOAT( float::math::exp(f) );
        _        =>   raise exception DIE "exp: impossible case";
        esac;

Quite general hierarchical subtyping relationships may be encoded and checked using phantom types. In general, for each additional level of hierarchy we will add one additional phantom type variable. For an in-depth discussion see Phantom Types and Subtyping by Fluet and Pucalla, 2006, from which much of the material in this tutorial was adapted.

Comments and suggestions to: bugs@mythryl.org