In the introductory material I state
Mythryl is not just a bag of features like most programming languages;
It has a design with provably good properties.
Many readers have been baffled by this statement. This is understandable enough; like the first biologists examining a platypus and pronouncing it a fake, the typical contemporary programmer has never before encountered an engineered progamming language and is inclined to doubt that such a thing truly exists, that one is technically possible, or indeed what it might even mean to engineer a programming language. Could designing a programming language possibly involve anything beyond sketching a set of features in English and telling compilers writers to go forth and implement? If so, what?
My goal in this section is to show that it is not only meaningful but in fact both possible and worthwhile to truly engineer a programming language.
Every engineering discipline was once an art done by seat of the pants intuition.
The earliest bridges were likely just trees felled across streams. If the log looked strong enough to bear the load, good enough. If not, somebody got wet. Big deal.
Over time bridges got bigger and more ambitious and the cost of failure correspondingly larger. Everyone has seen the film of Galloping Gertie, the Tacoma Narrows bridge, being shaken apart by a wind-excited resonant vibration. The longest suspension bridges today have central spans of up to two kilometers; nobody would dream of building them based on nothing more than "looks strong enough to me".
We’ve all seen films of early airplanes disintegrating on their first take-off attempt. This was a direct consequence of seat of the pants design in the absence of any established engineering framework.
The true contribution of the Wright brothers was not that they built the first working airplane, but rather than they laid the foundations of modern aeronautical engineering through years of research and development. With the appropriate engineering tools in hand, building the aircraft itself was a relatively simple exercise. The Wright Flyer was the first controllable, workable airplane because the Wright brothers did their homework while everyone else was just throwing sticks, cloth and wire together and hoping. Sometimes hoping just isn’t enough.
Large commercial aircraft today weigh hundreds of tons and carry hundreds of passengers; nobody would dream of building one without first conducting thorough engineering analysis to ensure that the airframe will withstand the stresses placed upon it. Airplanes no longer fall out of the sky due to simple inadequacy of airframe design.
Airplanes do however fall out of the sky due to inadequacy of flight software design. Software today is still an art rather than an engineering discipline. It ships when it looks "good enough". Which means it often is not good enough — and people die.
Modern bridges stand up, and modern airplanes stay in the sky, because we now have a good understanding of the load bearing capacity of materials like steel and aluminum, of their typical failure modes, and of how to compute the load bearing capacity of engineered structures based upon that understanding.
If we are to reach the point where airliners full of passengers no longer fall out of the sky due to software faults, we need to have a similarly thorough understanding of software systems.
Modern software depends first and foremost on the compiler. What steel and concrete are to bridge design, and what aluminum and carbon composites are to airframe design, compilers are to software design. If we do not understand the load bearing limits of steel or aluminum we have no hope of building consistently reliable brdiges or airframes. So long as we do not understand what our compilers are doing, we have no hope of building consistently reliable software systems, and people will continue to die every year due to simple, preventable software faults in everything from radiological control software to flight control software to nuclear reactor control software to car control software.
Our minimal need is to know what meaning a compiler assigns to a given program. So long as we have no way of agreeing on the meaning of our programs, as software engineers we have lost the battle before the first shot is fired. Only when we know the precise semantics assigned to a given program by our compiler can we begin to develop methodologies to validate required properties of our software systems.
I do not speak here of proving a program "correct". There is no engineering analysis which concludes with "and thus the system is correct". What we can do is prove particular properties. We can prove that a given program will not attempt to read values from outside its address space. We can prove that a given program will always eventually return to a given abstract state. We can prove that a given program will always respond within one hundred milliseconds. We can prove that a given program will never enter a diverging oscillation. We can prove that a given program will never read from a file descriptor before opening it and will always eventually close that file descriptor. We can prove that certain outputs will always stand in given relationships to corresponding inputs. Given time, tools and effort, we can eventually prove enough properties of a flight control program to give us reasonable confidence in trusting hundreds of lives to it.
Traditional programming language "design" does not address the question of the meaning of the language. In an engineering sense, traditional programming languages are not designed at all. A list of reasonable-sounding features is outlined in English, and the compiler writer then turned loose to try and produce something vaguely corresponding to the text.
The first great advance on this state of affairs came with Algol 60, which for the first time defined clearly and precisely the supported syntax of the language. It was then possible for language designers and compiler writers to agree on which programs the compiler should accept and which it should reject, and to develop tools such as YACC which automate significant parts of the compiler construction task, dramatically reducing the software fault frequency in that part of the compiler. But we still had no engineering-grade way of agreeing on what the programs accepted should actually be expected to do when executed.
The second great advance on this state of affairs came with the 1990 release of The Definition of Standard ML, which specified formally and precisely not only the syntax but also the semantics of a complete usable programming language. Specifying the syntax required a hundred phrase structure rules spread over ten pages. Specifying the semantics required two hundred rules spread over another thirty pages. The entire book ran to barely one hundred pages including introduction, exposition, core material, appendices and index.
As with the Wright brother’s first airplane, the real accomplishment was not the artifact itself, but rather the engineering methodology and analysis underlying it. Languages like Java and C++ never had any real engineering analysis, and it shows. For example, the typechecking problem is for both of those languages undecidable, which is mathematical jargon for saying that the type system is so broken that it is mathematically impossible to produce an entirely correct compiler for either of them. This is not a property one likes in a programming language, and it is not one intended by the designers of either language; it is a simple consequence of the fact that the designed of neither language had available to them an engineering methodology up to the task of testing for and eliminating such problems. Like the designers of the earliest airplanes, they were forced to simply glue stuff together and pray for it to somehow work.
The actual engineering analysis conducted for SML is only hinted at in the Defintion. To gain any real appreciation for it, one must read the companion volume Commentary on Standard ML.
Examples of engineering goals set and met by the designs of SML include:
In general it is excruciatingly easy for the typechecking problem to become undecidable because one is always stretching the type system to accept as many valid expressions as possible.
Any practical type system must err on the side of safety, of rejecting any program which is not provably typesafe, and will consequently wind up throwing out some babies with the bathwater, rejecting programs which are in fact correct because the type system was not sophisticated enough to realize their correctness. One is always trying to minimize the number of such spuriously rejected by being just a little more accomodating, and in the process creeping ever closer to the precipice of undecidability. The job of the programming language type system designer is to teeter on the very brink of that precipice without ever actually falling over it.
The design process for SML involved explicitly verifying these properties by informal and formal proofs, repeatedly modifying the design as necessary until these properties could be proved. This intensive analysis and revision process yielded a number of direct and indirect benefits, some obvious, some less so:
SML was the first general-purpose realistic programming language to enjoy rigorous engineering-grade design analysis of this sort comparable to what we routinely do for a proposed bridge or airframe. SML/NJ is the reference implementation of SML, constructed with the active assistance of the SML language designers. Mythryl inherits this theoretical foundation and this codebase, and adapts it to production use in the open source tradition.
Further reading
The definitive work on the SML language is
The Definition of Standard ML (Revised)
Robin Milner, Mads Tofte, Robert Harper, David MacQueen
MIT Press 1997 ISBN 0-262-63181–4
The definitive work on the SML language design analysis process is
Commentary on Standard ML
Robin Milner, Mads Tofte
MIT Press 1991 ISBN 0-262-63137-7
You will find the former very slow going without the latter to illuminate it!
If you are new to this style of operational semantics, you may find useful background introductory material in:
Types and Programming Languages
Benjamin C. Pierce
MIT Press 2002 ISBN 0-262-16209-1
If the above leaves you hungering for more, you might try
Advanced Topics in Types and Programming Languages
Benjamin C Pierce (editor)
MIT Press 2005 ISBN 0-262-16228-8
Some more recent works on ML semantics:
Understanding and Evolving the ML Module System
Derek Dreyer 2005 262p (thesis)
http://reports-archive.adm.cs.cmu.edu/anon/usr/anon/home/ftp/usr0/anon/1996/CMU-CS-96-108.ps
A Type System for higher-order modules
Dreyer, Crary + Harper 2004 65p
http://www.cs.cmu.edu/ dreyer/papers/thoms/toplas.pdf
Singleton Kinds and Singleton Types
Christopher Stone 2000 174p (thesis)
http://reports-archive.adm.cs.cmu.edu/anon/usr/anon/home/ftp/usr0/ftp/2000/CMU-CS-00-153.ps