Toward a General Processor for Programming Languages


Introduction

The principle by which progress has traditionally been made in programming technology is that of the replacement of the special by the general. Programmers discover from time to time that they have been repeatedly rewriting substantially the same routine for one job after another; when the underlying identity of these several versions becomes apparent, a single general routine, needing nothing more than suitable parameterization to adapt it to each particular application, is produced to replace all of them. This procedure, although widely successful in the past, has so far failed to deal with the great current problem of programming technology, that of compiler production.

For some years now the greatest single obstacle to more widespread use of the computer has been the insufficiency in variety and power of the higher level languages needed, especially by nonprofessional programmers. This shortage, in turn, is largely due to the great expense of providing a compiler for each desired language, and the scarcity of the highly experienced software specialists who build such processors.

Many attempts have been made to break through this barrier by applying the principle of generalization that has so often worked before, thereby producing either a universal language or a general processor that would translate an indefinitely growing number of languages (included under this head are all systems for producing compilers quickly and cheaply). A few dozen such projects that reached the point of being written up in the literature are listed in a recent survey [1]; how many other attempts have been made can only be conjectured, but experience suggests that there must have been several for every one that got far enough to yield a paper. It is striking, then, that all these efforts, costing some hundreds of highly skilled man-years, must regretfully be judged to have failed in their main purpose. Much valuable experience must have been acquired by those participating in these projects, and no reproaches are being directed toward any of them; nevertheless, none of the resulting languages or processors is in widespread use today, and the conventional monolingual compiler is still the standard means of language implementation.

If so much talented effort has failed to produce the desired result, some fundamental errors must underlie our thinking. This paper is an attempt to identify these errors and to suggest how our energies might be redirected along more fruitful lines. No attempt is made in this paper to give a full description of the prototype system that was constructed to test the ideas presented here. For such a description, see [3] and [16], particularly the latter.

Language or Processor?
The cited survey of attempts to break through the software barrier distinguishes, we noted, two principal strategies for doing so: the universal language and the universal processor. As its title will already have disclosed, this study takes the preferability of the latter as axiomatic. Our quarrel with the universal language is not over the particulars of any of the languages proposed for that role, but with the concept itself. With new computer applications being found daily, with the computer-using population growing at a still accelerating pace, and with that population taking on an increasingly lay character, the notion of a universal language is indefensible today. Few would now contest this; we have reached something close to unanimity on the proposition that a universal language is certainly not imminent and possibly not even desirable, and we are at least reconciled to the prospect of a multiplicity of languages.

In view of this consensus, the question that needs to be answered is why most of us are still building processors suitable only for the universal language—i.e., conventional monolingual compilers. If the universal language is nowhere in sight, then for a programming system, as for other creatures competing to survive, ability to change is the first law of life. While we cannot foresee the future in detail, we can be sure that new problems will present themselves and that we will be dissatisfied tomorrow with today's highest achievements. If this is so, then openendedness is the very first quality to be sought in the design of a processor. In clinging to the compiler (unless otherwise qualified, "compiler" will mean the conventional monolingual kind) as our standard means of implementation, however, we show that this requirement is far from our minds. The functional extension or modification of a compiler-implemented language generally requires that changes be made in the compiler itself; the difficulties of doing this are sufficiently notorious to need no discussion here; see, e.g., [4, p. 395]. One reason for our continued dependence on the compiler is undoubtedly the alleged shortcomings of its rivals (these will be discussed later); another, certainly, is our natural reluctance to face in advance, and to plan for, the time when our handiwork will need to be changed. Where such reluctance is responsible for the implied claim to finality that the writing of a compiler represents, the designers have missed the point. Programming systems are changed not because they are bad—the bad ones simply perish—but when they are good enough to make someone find it worthwhile to improve them.

Some language designers have in principle recognized the importance of flexibility, but do not seem to have understood how fundamental a consideration it is, and how early it must figure in their planning if it is really to be attained. The evidence of this misunderstanding is their attempt to add flexibility, like a pinch of salt, to a compiler. Their first thought and best efforts go into an attempt to make extendibility unnecessary; the usual procedure has been to start with some algebraic-expression compiler—ALGOL in one of its various forms has been a favorite—and add to it whatever it seems to lack as an all-purpose programming language. The additions commonly include input-output and string-defining and string-manipulating operations, and, as a catchall for whatever may be thought of later, an escape hatch through which the programmer can drop into machine language. Systems of this description, we think, have not fulfilled the hopes once held for them, and foremost among the lessons to be drawn is that flexibility cannot be thrown into the pot as an afterthought, but must be planned for from the beginning.

The reasons why this common strategy is bad are worth some discussion, since it appears to have come into practice without close examination. The notation that is wanted for doing numerical computation, and the technique of compiling from statements in that notation, are among the best understood and agreed-upon topics in programming, and seldom need that part of a processor that deals with algebraic expressions be changed. The notation for and the treatment of nonnumeric objects are much less settled—strings, tables, pushdown lists, and so on, can be specified and handled in any number of ways and extended in any number of directions; there is no universal agreement even as to the necessity of any one of these objects. Clearly, then, those who are concerned with flexibility and open-endedness should design their systems with an eye to the main problem area, noncomputational programming, where most changes and additions are to be expected. Whatever structure they settle on for their processor, they will always be able to give it the ability to handle algebraic expressions in the form of a packaged algebraic-expression compiler. (Such packaged algebraic compilers have been fitted to assemblers [2] and macro expanders [3] almost as if they were mere closed subroutines.)

To do the reverse, as is usual—to start with an algebraic compiler and break into it in order to make what seem at the moment to be all the necessary additions—is to violate the integrity of the one procedural package that is naturally closed and unified, and to produce thereby a composite that will be at least as hard to modify when the next generation of changes is to be made. To employ a homely illustration, one doesn't lengthen a piece of rope by cutting it in two and splicing a new piece between its halves.

Those who have had to modify or extend compilers have, in fact, recently turned with increasing frequency to the macro instruction technique, finding that it enables them to do the job without the virtual reconstruction of the processor that would otherwise be necessary. While this idea of using macros to extend a compiler language has occurred to many programmers [4-7], it does not appear to have suggested to any of them what seems to be the natural conclusion: since such modification is the fate of all good languages, they might with advantage be implemented by macro processors from the outset. We will return to this point at considerable length in later sections.

The Generalized Processor and the Three Dimensions of Programming Languages
The idea of the generalized processor or translator has grown from the recognition that all programming language translators are very much alike in structure and function, no matter how dissimilar the languages they accept as input. All of them begin with a scanning phase in which the source language statements are promptly identified as to type, decomposed, and reduced to entries in tables and lists In the process all distinguishing marks of the original notation are stripped away, with nothing but minimal representations of the source program's information content remaining. From this early point onward, all translators proceed to do much the same thing in much the same way. Noticing that the various translators or programming language processors differed chiefly in the scanning phase, many thoughtful programmers have proposed or attempted the construction of a general programming language translator that would subsume the others as special cases. Such a general translator would need one feature not found in any of the monolingual translators it was to replace: a "metalanguage" or notation whereby its users could describe to it the various source languages with which it was to deal.

Apart from the economies to be expected from the replacement of a variety of processors by a single generalized processor, there is the promise in this concept of a better apportionment of effort in the construction of programming systems. Conventionally, the greatest expenditure of time and effort by those involved in such work is on the specification of the source language. Since this is the face the system will show to the world, and the only part of it most users will ever see, the heavy investment in making it attractive is easily understood. The fine structure of the processor is something that few people will look at, and it is too often just what one would expect from hurried, harassed people working in the knowledge that their product, provided it works, is effectively exempt from detailed criticism. The time and care spent in polishing the source language, although disproportionately great, will not have succeeded in pleasing all users; the scrimping and corner-cutting in the construction of the processor, on the other hand, will certainly penalize all users of the system, and will be a particular cause of trouble to those who will have to maintain and revise it.

The picture of programming-system construction that emerges is strangely like that presented by cheap mass-produced products: too much attention to styling, not enough to engineering. (Much of the talk about programming language standardization is rather depressing for the evidence it gives of similarly misdirected energies; it is as if the automobile maker should proudly announce a standardization campaign for his product that would leave engine, transmission, and chassis as handmade, non-interchangeable parts, but would ensure uniformity in body color and driver's seat position.) A workable generalized processor would do much to right this inverted scale of values. The specification of notation would be left in the hands of users, where a matter as subjective and (logically speaking) trivial belongs; the attention of the system designer could be turned to the processor, which is usually much in need of it. Indeed, one of the greatest dividends of a generalized processor, after the notational flexibility and the simplification of computer room operations it made possible, should be the decidedly superior maintainability and revisability of the processor itself. Important improvements could be expected in these departments from a system whose designers were not only spared concern over such an irrelevancy as the source language it was to process, but also allowed a budget reflecting its equivalence in utility to several conventional ones.

Since the reader not already persuaded of the desirability of the general processor will probably not be much affected by further arguments for it, we assume this point granted and pass on to the question of how such a system may be realized. It is necessary in explaining this to review some fundamentals.

We are considering a processor that is to be capable of having introduced to it an indefinite number and variety of programming languages, all of which it must be able to translate from then on. What is required in the design of such a translator is made clear if we think of a programming language as having three dimensions or aspects:

(1) the functional—the dimension in which a language is a body of operators, a repertoire of commands,

(2) the notational—the dimension in which it is a set of permissible character strings that are used to call upon the functions (usually one per function),

(3) The modal—the dimension comprising all those elements of the language that are not to be translated into object coding, but are instead calls on procedures to be executed at translation time in order (for example) to alter the translator's mode of behavior.

It will be clear that a processor that is to be introduced to new languages must offer software designers some convenient way of describing their languages in each of these three dimensions. Techniques for accomplishing this are discussed at some length in the following pages, but let us indicate in just a few lines the general direction in which we look for answers to these three problems.

It may be observed immediately that a mechanism for describing a language in the first of the three dimensions, the functional, has existed for many years in the shape of the macro instruction feature. The macro has not been generally thought of as a vehicle for the implementation of higher level languages because it has not been offered in combination with suitable mechanisms for handling the other two dimensions, and without such support it is heavily incapacitated. When macro instruction capability is merely superimposed on an assembly program in the hope that application programmers will use it to bootstrap their way up from assembly language, it is seldom very successful; the use of private macros as occasional auxiliaries in assembly language programming is trivial and has never yielded the rewards that macros promise. This disappointing experience is largely due to the insufficiently appreciated fact that a programmer defining a body of macro instructions is, at least in part, defining a programming language—and most programmers are neither interested in defining their own language, nor capable of it. The macro instruction is not properly just another item in the machine language coder's bag of tricks; it is a way—the best way, we will contend—for the higher level language designer to implement his language. It is in this role and only in it that macros are recommended.

The notational dimension has been the subject of much research in the past few years: it is with this aspect of programming languages that the various syntax-directed, syntax-driven, and syntax-oriented processors have been concerned. It might have been expected, therefore, that this dimension, like the functional, would be suitably provided for by existing techniques. In fact, however, not one existing higher level programming language can be specified in its entirety by any compact formalism so far described. It seems that the reason why the metalinguistic formalisms so far proposed have been of so little practical usefulness is that they have placed too high a premium on satisfying criteria that are important in the disciplines of mathematics and linguistics, but are of merely marginal concern to the practical software designer. We have found that an empirical approach to the metalanguage problem, in which our efforts are turned toward the development of simple, explicit devices for describing real programming languages in their full, sometimes baroque, complexity, yields a technique for describing notation that is immediately applicable. (For more information on these devices, see [16]; for a language described by their means, see the Appendix to this paper and [17].)

The modal dimension can in large part be dealt with by the provision of a stock of pseudo operations that enable the user to call for any of the standard (and a few unusual) translation time procedures. But this built-in stock, while a useful time saver, does not suffice; a new language may contain modal elements of a wholly new type, and the language designer must be given the ability to specify them conveniently. What is called for is nothing less than the ability to demand the execution at translation time of any arbitrary procedure, and this is in fact provided in the model under discussion. The means whereby it is provided, and the manner in which it is used, are discussed at some length in a later section.

These methods, collectively amounting to a metalanguage for the definition of programming languages in their full tridimensionality, will now be dealt with in some detail, the space devoted to each being determined by our estimate of the likelihood of misunderstanding rather than of their relative importance.

Logical Place of the Macro Processor Among Programming Systems
The macro processor is often thought of as a forerunner or unsuccessful competitor of the compiler. This widespread impression is based on the historical relationship between the two kinds of processor, which involves many factors extraneous to technical merit—an aspect of the matter that will be investigated in the next section. Their purely logical relationship is simple: the compiler is a special case of the macro processor. The compiler's fundamental mode of operation is that of the macro processor as well: given as input an executable statement in the language it recognizes, it produces as output the machine language coding necessary to realize the operation called for. It differs in that it does not allow the definition of new operators within source programs, as does the macro processor.

The key notion in the distinction we are making is that of new operator. There is a trivial sense in which new operators may be defined within almost all compilers, which commonly permit both the formation of subroutines and reversion to symbolic machine language. Neither of these facilities represents flexibility or open-endedness in the sense offered by the macro processor. Compiler language subroutines are mere compounds of existing compiler functions, and afford the user no power not designed into the compiler by its authors. Reversion to symbolic machine language does indeed give the user, in principle, the power to create any function he may desire; in practice, it is a very costly device to resort to (the price is itemized in the next section), and it is for many users inaccessible at any price—they do not know machine language, or are forbidden to use it. A genuinely new function, available to users on the same terms as those constituting the processor's original set, is realizable in a compiler only by tinkering with the innards of the processor itself.

For the good macro processor, the concept of operator definition is so thoroughly generalized and facilitated that a wholly new class of practical capabilities is at its user's disposal. With no knowledge of the processor's internal workings, any user may add a new function or modify or delete an existing one. New functions may be defined in any mixture of existing macros and the processor's ground language, switching from one to the other as desired without explicit linkage or gear-shifting statements. If the user needing a wholly new function is incapable of defining it for himself, he may have to turn for help to another programmer who knows machine language or has whatever competence he lacks, but he need not find someone who (a) knows the internal workings of the processor and (b) is ready to undertake the task of modifying it. The practical difference is immense.

To better appreciate what this generalization of the operator-defining feature means, note that the power to declare new operators extends downward as well as upward, so that the programmer can not only combine existing functions into higher level ones, but also smash what had been an atomic functional unit into a composite of smaller ones, ad infinitum.

If by "machine" we understand the extended entity "machine plus assembly program," and take assembly language as machine language, then this feature enables the programmer to redesign the machine he uses by defining its symbolic operation codes to be macros, thus causing them to have consequences at compiling time, execution time, or both, that are quite different from those promised by the machine's programming manual. Such power has important applications in debugging, for example. One common and troublesome kind of bug is the wild transfer, with its propensity for covering up its own trail and leaving memory in chaotic condition. A macro processor user plagued with this problem can, within his own program, define all transfer-type instructions as macros. These would expand into coding that would, at execution time, compute the effective address of each transfer type instruction in the program before permitting its execution. If such an address value should turn out to be legitimate according to criteria the programmer would have supplied, execution of the transfer would be permitted; if not, it would be ignored, and transfer would be made instead to an error routine also specifiable by the programmer. Another obvious application for this power to turn one's object program into a partial interpreter is that of selective memory protection. By redefining as macros all instructions destructive of memory, for example, a programmer can make any segment of memory "read-only" for the purposes of his program. Such software-implemented memory partitioning and protection is far more flexible than any scheme built into hardware--these and many similar resources are available to the macro processor user, it should be noted, as afterthoughts. They can be improvised and added, as found desirable, and discarded when no longer needed. The keynote throughout is flexibility, open-endedness, and adaptability at the hands of the ordinary user.

The importance of this fundamental advantage of the macro processor is sometimes depreciated by those whose bent of mind is more toward the manipulation of abstractions than observation of practice. Such critics often point to the conceptual correspondence between the compiler's subroutine facility and the macro processor's macro-defining facility, and consider that they have shown the two types of processor to be equivalent. They have not, of course; if equivalence could be established between systems A and B simply by showing that either can in principle do anything the other can, we would all be programming in absolute machine language, which is capable--in principle--of specifying to the computer anything it can be made to do at all. (The reductio can be pushed farther yet, for anything that can be done by computer can--still in principle--be done without one.) Absolute machine language is not widely used, however, and for reasons so obvious they bear mentioning from time to time: there are finite amounts of time and manpower available for the writing of any computer program, and what cannot be done within those limits is, practically speaking, impossible. In this practical sense there are irreducible differences between the two kinds of programming systems; the macro processor makes possible useful feats otherwise beyond reach.

Programming folklore, however, has it that these advantages, even if as real and valuable as we have claimed, carry with them two penalties of prohibitive severity machine oriented notation and object program inefficiency. Most existing macro processors are indeed deficient in these respects, but what matters for present purposes--suggesting policy for future software--is that neither of these is a necessary shortcoming. Both object program efficiency and notational sophistication are secured by auxiliary routines that are blind to the distinction between compiler and macro processor, and can as easily be incorporated into the one as the other.

The charge is frequently brought against the macro processor that, treating each macro instruction as a distinct subprogram having no communication with those preceding and following it, it is especially liable to produce redundant coding. In actuality, it is identical to the compiler in this (as in practically every) respect. Compilers, too, bite off chunks of any source program given them, and proceed to compile instructions without looking behind or ahead; few consider the entire source program before emitting any instructions. If the resultant inefficiencies are less notable in their object programs than in those produced by most macro processors, it is mainly because compiler designers have been more highly motivated to efface them with special optimizing routines. (Programmers familiar with the AUTOCODER family of processors will be aware that macro processors, too, can generate coding that is both data-sensitive and context-sensitive.)

In the matter of source language sophistication, also, the macro processor is as amenable as the compiler to the incorporation of the elaborate scans that permit programmers to use application oriented notation. Again, this potential is not commonly realized in macro processors, probably because this type of processor arose as an adjunct to assembly programs and remains associated in most programmers' minds with machine oriented programming. (It will be shown later that any degree of notational sophistication desired can be realized in a macro processor.) For further insight into these topics, and into the place of the macro processor among programming systems generally, we must turn to a study of the role of the macro processor in the history of programming system development.

Historical Place of the Macro Processor
It may fairly be asked why, if the macro processor is so clearly superior to the compiler, it is not in wider use. The answer lies not in the logic but in the history of the matter. The macro processor might well have become the standard type of programming language translator years ago, had it not been for the overwhelming (and well-deserved) success of the FORTRAN compiler shortly after its release by IBM in early 1957—and the widespread misattribution of that success. There are two respects in which FORTRAN is atypical among programming systems: it deals exclusively with numerical computation, the one computer application for which a universally accepted notation pre-existed; and it was forced by circumstances to put extreme emphasis on the production of quick-running object programs. Fortran's structure and trade-off of qualities are largely due to these peculiar circumstances, and were never intended by its creators to serve as models for later translators. Such has been FORTRAN 's influence, however, that many systems intended for very different applications, and developed in very different circumstances, have come as near as they can to recreating FORTRAN, warts and all.

In the hope that a wider appreciation of the reasons for FORTRAN 's special balance of qualities would make for less blind emulation of it, the writer some time ago published an informal description of the atmosphere prevailing in programming circles during its gestation period [8]. This account has since been authoritatively confirmed by Backus and Heising [9], who write:

The current interests of people who write compilers and devise programming languages seem to center on finding orderly and efficient methods of transforming elaborate source programs into legitimate object programs. This paper describes some of the considerations that led the group developing FORTRAN during 1954 to 1957 to put extreme emphasis not on orderly and quick translation techniques, nor on elaborate source language features, but rather on the efficiency of the object programs themselves. There is some discussion of problems raised by such emphasis and their impact on the language.

Most of the FORTRAN language specifications were frozen almost ten years ago. It was therefore designed in a different atmosphere within the computing industry than exists today. At that time, most programmers wrote symbolic machine instructions exclusively (some even used absolute octal or decimal machine instructions). Almost to a man, they firmly believed that any mechanical coding method would fail to apply that versatile ingenuity which each programmer felt he possessed and constantly needed in his work. Therefore, it was agreed, compilers could only turn out code which would be intolerably less efficient than human coding (intolerable, that is, unless that inefficiency could be buried under larger, but desirable, inefficiencies such as the programmed floating point arithmetic usually required then). There were some systems around at the time which tended to confirm this popular view.

This then was the professional climate in which the design of the FORTRAN language was begun and completed ... the [FORTRAN] group's attention was focussed on what seemed then to be the challenge: could an automatic coding system take over a lot of tasks from the programmer and still produce object programs of competitive efficiency? ... Confronted with this skepticism ... the group had one primary fear. After working long and hard to produce a good translator program, an important application might promptly turn up which would confirm the views of the skeptics: this application would be of the sort FORTRAN was designed to handle and, even though well programmed in FORTRAN, its object program would run at half the speed of a hand-coded version. It was felt that such an occurrence, or several of them, would almost completely block acceptance of the system. (Of course, few would feel that way today.)

It seems fairly clear from this that the FORTRAN authors were not themselves convinced of the absolute priority of object program optimization but merely accepted it as a popular conviction too strong to be shaken. This impression is confirmed by the following passage from the classic paper that introduced FORTRAN to the world in 1957 [10]:

The generality and complexity of some of the techniques employed [by FORTRAN! to achieve efficient output programs may often be superfluous in many common applications. However, the use of such techniques should enable the FORTRAN system to produce efficient programs for important problems which involve complex and unusual procedures.

We can safely conclude, then, that the overwhelming preoccupation with object program efficiency seen in FORTRAN was regarded as economically unjustified by its authors as long ago as the mid-1950's. (Since then, of course, as Backus and Heising note, the irrationality of this older view has become increasingly apparent; computing power has dropped in price by at least a factor of ten, while programmers' salaries and other indirect costs have risen sharply. These developments have made the relative importance of machine time even smaller than it was in the 1950's, and object program efficiency a correspondingly slighter consideration. For more extended argument on this point, see [8].)

The relevance of these historical considerations to the present investigation into the relationship between the compiler and the macro processor is quite direct. The degree of object program optimization the FORTRAN group felt required to achieve is more easily realized in a compiler than in a macro processor; indeed, with the processor-building tools and techniques of ten years ago, perhaps possible only in a compiler. If the designers of a processor can themselves specify all the statements that it will have to handle, they can dispense with the far greater generality of treatment required in a processor whose repertoire of operations may be extended indefinitely by its users. If the notion of leaving FORTRAN open-ended ever occurred to its authors (it does not appear that it did), it would soon have been abandoned as the language constraints imposed by the optimization requirements made themselves felt. (Backus and Heising report that these requirements forced the FORTRAN group to further restrict even their fixed compiler language.)

By happy accident, the forfeiture of extendibility (or open-endedness) was not fatal to FORTRAN. It escaped because its intended application, numerical computation, was both well-defined and ready-provided with a universally accepted notation. FORTRAN authors could be reasonably certain, therefore, that they had implemented a complete operation repertoire and a satisfactory notation. Even so, a need for extendibility was soon felt, and was supplied in the shape of an ability to combine at load time FORTRAN-compiled object programs with others assembled from symbolic machine language programs. This falling back on assembly language for procedures not allowed in its own language is the typical compiler response, sooner or later made by all of them, to the discovery that its own is not entirely adequate. Insofar as the programmer resorts to this expedient, of course, he forfeits all the vaunted compiler-language advantages: application-oriented notation, freedom from coding level errors, machine independence, and source-language debugging.

In brief, then:

(1) The compiler is widely regarded as the standard form of processor because of FORTRAN 's success;

(2) FORTRAN had to be a compiler only because of the near universal obsession with object program efficiency that prevailed at the time it was designed;

(3) That obsession has since lost much of whatever justification it may then hive hid, which the authors of FORTRAN did not regard is very impressive even then;

(4) FORTRAN, for all its advantages of well-defined application area and accepted notation, soon found that it needed the extendibility it had sacrificed to satisfy that obsession;

(5) Being a compiler, FORTRAN could provide that extendibility only in the form of a facility for linking its own object programs with others written in assembly language;

(6) The use of this expedient entails, so far is it goes, the loss of the advantages that compilers exist to provide.

Since the FORTRAN compiler was made widely available in 1957, it has become—deservedly—the workhorse of the scientific programming world, and—quite improperly—a model on which other programming systems, similar in purpose or not, are often patterned. It is clear why FORTRAN was designed is it was, and no blame attaches to its authors for the shortcomings its chosen form entails. In order to get a high level programming language accepted, they had to satisfy the conventional wisdom of their day; given this necessity, the rest follows. It is hard to see how else they might have designed it even if they had been given clear foreknowledge of the last ten years' experience. But what was reasonable and necessary then does not excuse us today, and blame does attach to us if, ignoring the economic and technological facts of our own day, we unthinkingly adopt FORTRAN 's priorities and structure. Most particularly would it be ill-advised to let FORTRAN 's success lead us into the trap of trying to write the final compiler for the perfect language.

The Notational Dimension
It was remarked earlier in this study that the notational dimension has received a great deal of attention in recent years, almost all research into formal languages, grammatical models, and syntax guided compiling being devoted to this aspect of language. Such concentration of effort on one alone of language's three aspects is made quite explicit in the more formal papers in this tradition, which commonly begin by defining a language to be some particular set of strings drawn from a specified alphabet, and a grammar as a mechanism for generating or recognizing that set. There is no objection to the creation and manipulation of such abstractions by formalists (so long is they remember, as they occasionally fail to [14, pp. 46-50, 96-98], that they are abstractions), but it must be borne in mind that even a completely satisfactory handling of this one aspect of language, if it were achieved 1, would not constitute a complete metalanguage in our tridimensional sense.

The various metalanguage processors that have been discussed in the literature are, almost without exception, founded on the language model known as Phrase-Structure Grammar (PSG); the most recent survey of such work [11], while admitting the inadequacy of the PSG model for the complete description of any actual programming language, speaks of no alternative. The inadequacy of the PSG for our purposes seems to be radical in nature, and not due merely to lack of sufficient development: it was conceived originally as a model for natural rather than programming languages, and its moving spirit is scientific rather than engineering. With these origins, the PSG should hardly be expected to serve as a practical method of describing programming languages.

The analogy between natural and programming languages is captivating but shallow (the most important of the differences will be discussed later), so that tools designed for dealing with the one will almost necessarily be ill-adapted for the other. The scientific rather than engineering origins of the PSG concept further explain its failure in the role into which programmers keep trying to force it. Science differs from engineering in that (among other things) its motivation must be self-generated; it cannot progress without a hypothesis to test. And it is no fatal defect in a scientific hypothesis to be patently false, since a hypothesis is a heuristic device and not a premise of action. Thus, the hypothesis "the PSG is a full model of natural language" is known to be false, but is far from useless on that account. It does describe some fraction of natural language large and rich enough to be of interest, and is itself a calculus of some intrinsic interest; these two properties are enough to guarantee (and justify) its continued interest for mathematical linguists. There is much to be learned from pushing a strictly false hypothesis as far as can be, and determining exactly the points at which it fails: only thus can the basis for a better hypothesis be formed, and science advance. Engineering practice, on the other hand, does not permit the entertainment of ideas known to be false, its objective being the solution of practical problems. In these terms, the PSG is clearly a scientific hypothesis to which we may look for valuable insights, and equally clearly an unpromising candidate for the role of working metalanguage. The linguists to whom the PSG is due have told us as much; Chomsky [12, p. 5], for example, writes:

Precisely constructed models for linguistic structure can play an important role, both negative and positive, in the process of discovery itself. By pushing a precise but inadequate formulation to an unacceptable conclusion, we can often expose the exact source of this inadequacy and, consequently, gain a deeper understanding of the linguistic data.

The system of metalinguistic devices we would suggest for the description of a language in its notational dimension is consequently very different from the PSG. Its intended domain is the programming language exclusively; if it has any bearing on the investigation of natural language, we shall be deeply (and delightedly) astonished. The conceptual basis of this notation-describing part of a total metalanguage is the recognition and exploitation of a canonical form for programming languages. Because all such languages must eventually be transformed into actual machine language insofar as they are to have any effect, the canonical form is that of a generalized machine language statement:

operator, operand1, operand2, • • • , operandn

It is the existence of a canonical form that chiefly distinguishes programming from natural languages, and makes possible a substantially adequate metalanguage for the former.

In this scheme for a generalized processor, then, an arbitrary statement form is described by specifying the transformations that map it into canonical form, and the notation-describing part of the metalanguage consists of a repertoire of such transformations. A prototype collection of such transformations has been described in some detail elsewhere [3, 16]; just enough is introduced here to convey something of its flavor. Consider the almost trivially simple statement form, STORE INTO X THE LOGICAL SUM OF Y AND Z; reduced to canonical form, it would read: STORE, X, Y, Z. The needed transformations are (1) a punctuation-changing one that would (in this case) define the blank to be a separator, (2) a noise-defining one that would define the symbols INTO, THE, LOGICAL, SUM, OF, and AND to be noise—i.e., symbols that the scan is to ignore wherever in this statement it may encounter them.

Other transformations that may be desired would include (3) a parameter-identifying one that enables the scan to identify the parameters that will replace the dummies X, Y, and Z by context. Without this last transformation, these parameters would be identified by exception, being the only symbols remaining when the noise words were omitted from consideration, and they would be required to occur in an order corresponding to that of the dummies they were to replace. Using the identification-by-context transformation, any elements of the statement that invariably accompanied a particular parameter might be explicitly made "cue words" for it, and the parameter in question identified by means of this contextual feature no matter where in the statement it occurred. In the example given, for instance, it might well be desirable to use this transformation to stipulate that the parameter to replace dummy C will be that occurring next after the cue words IN or INTO, or any one of several synonymous ones. If this transformation were used, along with (1) and (2) already mentioned, then any of a large number of variants on the form originally specified could be correctly mapped into canonical form (and hence compiled into correct coding); this number would include, among many others:

(a) STORE LOGICALLY INTO GAMMA THE SUM OF ALPHA AND BETA.

(b) INTO GAMMA STORE ALPHA AND BETA LOGICALLY.

(c) LOGICALLY STORE THE SUM OF ALPHA AND BETA INTO GAMMA.

(d) STORE INTO GAMMA THE LOGICAL SUM OF ALPHA AND BETA.
From each of the above examples a generalized scan provided with the transformational specifications described would extract STORE as the operator, and ALPHA, BETA, and GAMMA as operands corresponding to A, B, and C, respectively. This flexibility is a natural bonus of the metalinguistic approach adopted, in which the objects described are not complete rigid statements but only their constituent elements: operator names, punctuation strings, cue words, and so on. The programmer thus has at his disposal not a body of stereotyped statement forms that must be reproduced character-for-character under pain of compilation failure, but a vocabulary and a grammar for combining its elements, just as in natural language. Among the benefits springing from this flexibility is the power to tailor source notation to the immediate application. If the programmer is debugging, with the listing undergoing daily change, he may use a minimal notation—the canonical form itself, if he likes. Doing so cuts compilation time and allows logical checkout of the program at minimum cost. When the program is running, the programmer may then enrich the notation as far as he pleases, for the sake of those, including himself, who will have to read and understand the listing in the future (especially if any nonprofessional programmers are among them).
It should also be noted that the notational transformations, of which only a few have been mentioned here, can be used so as to apply to a single statement form or to any group of them, up to the entire language. Most languages will exhibit a substantial degree of notational uniformity—punctuation in particular tends to be uniform within a notation—and insofar as a notation is uniform, the transformations that define it can be specified within any one of the macros of the language, accompanied by a pseudo operation that declares their domain to be global—i.e., language-wide. Those notational features not common to all statement forms are specified within the macros associated with the statements that do exhibit them, without the "global" declaration; this usage limits their range to just those statement forms within whose macros they are given. The flexibility afforded by this notation-describing scheme is such as not only to encompass practically all the well-known programming languages, but also to hold out the promise of realizing a considerable natural language programming facility. (For extended discussion and explanation of this claim, see [15].)

Machine Independence vs. Object Program Efficiency
It is widely believed that machine independence and object program efficiency are essentially antithetical goals between which we can only hope to strike a reasonable balance. Actually, the two are necessarily in conflict only within the context of a certain approach to machine independence; given another approach to that goal, the conflict evaporates. The conventional approach to machine independence is to divide the total translation process into two distinct temporal phases: an earlier, completely machine independent one that translates programs from source language into some intermediate representation, and a later one, oriented to some particular machine, that carries the program from intermediate to machine language. Such a scheme is capable of ensuring machine independence for the larger part of a translator (which is all that can be expected under any plan), but it clearly conflicts with the other desirable goal of giving the programmer complete access to the target machine. Everything said by the source language programmer must, under the conventional plan, be filtered through the intermediate language before it reaches machine language, and this intermediate language, once specified by its original designers, must remain essentially invariant if the whole system is not to collapse. This interposing of what is effectively a standardized hypothetical machine between the user and his actual target machine precludes the full exploitation of the actual hardware that is the basis of highly efficient object language coding. (It remains possible, of course, to make the purely logical kind of optimization typified by deletion of redundant instructions, moving of computations to less frequently executed coding paths, and so on; it is such really important economies as proper use of the actual machine's I/0 facilities that may be ruled out under the usual scheme.)

There is, however, an altogether different approach to machine independence which, while at least equally successful in facilitating the processor's transference to any new machine, is perfectly compatible with any desired degree of object coding efficiency. If the conventional approach may be termed that of temporal segregation of the machine-dependent part, the alternative to which attention is being called may be termed spatial segregation of the machine-dependent part, and its adoption is in fact a corollary of the overall structure advocated here on other grounds. Schematically, the two forms may be represented as in Figure 1.

The generalized processor whose structure has been outlined in the foregoing pages follows the policy of spatial rather than temporal segregation of its machine dependent part, with great advantage. Since the first thing a language designer does in using such a processor is to install in it a body of macro instructions that are defined directly in the symbolic assembly language of the target machine, the resultant translator is machine-dependent, in part, from the start. This machine dependent part, be it noted, is just as rigorously segregated from the machine-independent part as is its counterpart in the conventional temporal segregation scheme—the macros are quite distinct from the processor that handles them, so there is no penalty in this respect in adopting the spatial segregation policy. There is, however, one very important advantage in adopting it: the user, in writing the required machine language macros, can use all his knowledge of the target machine to ensure that the coding generated will use it efficiently. There is no intermediate language standing between the source language designer and the machine; he specifies for it the exact instructions it will be dealing with at execution time. In this way the spatial segregation scheme, while making just as sharp a separation between the machine-dependent and machine-independent parts as the conventional scheme, also makes possible any desired degree of object program quality.
Paradoxically, in fact, a general processor so organized will, other things equal, turn out better object coding than will a conventional compiler that has been hand-coded expressly to compile from the source language in question. Though this assertion flies in the face of our intuitive certainty that generality must be paid for by less perfect efficiency at any particular task, a moment's thought is sufficient to show why it is nonetheless true. In creating a translator for a higher level language, whether it be done conventionally or as proposed here, the designer will from time to time see opportunities for optimizing the coding his translator will produce. How many of these opportunities he takes and how many he ignores will depend on his conscientiousness, his ability, and the time at his disposal. Lack of one or more of these, as anyone experienced in such work can confirm, often prevents designers from taking advantage of such opportunities. In implementing his language by means of a general processor, the designer still has to notice opportunities for optimization as they arise—no system will do that for him—but once he has noticed them, he has at his disposal built-in tools that make the realization of such potential savings much easier. The general processor he is dealing with includes, as we have in part seen, modal operations that permit him at any point to name translation time flags and switches, assign them values, and perform tests on them that determine just what coding gets compiled, and in what way. The presence of a number of features of this kind makes optimization much easier to perform, and guarantees that for an equal amount of effort the general processor can be made to compile better coding than a conventional monolingual processor.

Have We Thrown Out the Baby?
An objection that has been brought against the scheme described in these pages is that, in requiring the language designer to define the primitive operations of his language directly in machine language, it in effect requires him to do most of the work that would be necessary in writing a conventional compiler for the language. This objection is, as a rule, made only by those whose compiler-writing experience lies far in the past.

The specifying of the coding that is to be compiled for each operator or statement-type in the desired language is a very small part of the total task of writing a conventional compiler; the really time-consuming part is the creation of all the mechanisms that ensure that such coding is handled in the appropriate way. Consider the FORTRAN language, for example: Backus has said that 704 FORTRAN statements generated between four and twenty machine instructions each [10, p. 1971. A DO statement, to take a concrete case, typically generates a mere half-dozen lines of coding, and any programmer who has ever set up a machine-language loop can specify them. What is very troublesome about the DO statement is the causing of the compiler to do the proper thing with each of those half-dozen simple instructions. It has to be told that if the user neglects to specify the increment by which the index variable is to vary, a value of 1 is to be understood; that the last two or three of the instructions generated by a DO are not to be compiled into the object program immediately, but are to be held aside until an input statement bearing a label specified in the DO statement has appeared and been processed; and that if DO's are nested, with several groups of such deferred instructions awaiting the appearance of the same label, that they are to be compiled into the object program (when that label finally appears) in the inverse of the order in which their generating DO's appeared. These are typical of the real problems in implementing a language—and these are the problems for which the generalized processor offers the language designer simple, explicit tools.

REFERENCES

1. Burkhardt, W. H, “Universal programming languages and processors: a brief survey and new concepts,” Proc. AFIPS 1965 FJCC, pp. 1-21.

2. Stiegler, A. D., “An algebraic compiler for the FORTRAN assembly program.”. Comm. ACM 5 (Nov. 1962), 545.

3. Halpern, M. I., “XPOP: a metalanguage without metaphysics.” Proc. AFIPS 1964 FJCC, pp. 57-68.

4. Bobrow, D. G., and Joseph Weizenbaum,. “List processing and extension of language facility by embedding”. IEEE Trans. EC-13, 4 (Aug. 1964), 395-400.

5. McIlroy, M. Douglas. “Macro instruction extensions of compiler languages”. Comm. ACM 3 (April 1960), 214-220.

6. Bennett, Richard K. and H. David Neumann,. “Extension of existing compilers by sophisticated use of macros”. Comm. ACM 7 (Sept. 1964), 541-542.
7. Fitzwater, D. R. “A storage allocation and reference structure.” Comm. ACM 7 (Sept. 1964), 542-545.
8. Halpern, M. I. “Computers and false economics”. Datamation (April 1964), 26-28.

9. Backus, J. W., and W. P. Heising, “FORTRAN,” IEEE Trans. EC-13, 4 (Aug. 1964), 382-385.

10. —————— et al. “The FORTRAN automatic coding system”. Proc. AFIPS 1957 Western Joint Comput. Conf., pp. 188-198.

11. Floyd, R. W. “The syntax of programming languages—a survey”. IEEE Trans. EC-13, 4 (Aug. 1964), 346-353.

12. Chomsky, N. Syntactic Structures. Mouton & Co., The Hague, 1957.

13. ——————“Formal properties of grammars”. Handbook of Mathematical Psychology, R. Duncan Luce, Robert R. Bush, and Eugene Galanter, Eds., John Wiley & Sons, Inc., New York and London, 1963, II, pp. 323-418.

14. Steel, T. B. Jr (Ed.). Formal Language Description Languages for Computer Programming. North-Holland Publishing Co., Amsterdam, 1966.

15. Halpern, M. I. “Foundations of the case for natural-language programming”. Proc. AFIPS 1966 FJCC, pp. 639-649; revised version in IEEE Spectrum 4, 3 (March 1967), 140-149.

16. ——————A manual of the XPOP programming system. Available from the author.

17. Stark, Roger. ALTEXT language manual. Lockheed Palo Alto Research Lab., Palo Alto, Calif.


Article Footnotes

1 There is some reason to believe that no one of the three linguistic dimensions we postulate can be wholly understood in abstraction from the other two, since they are dimensions of the same entity, not separable pieces or layers. Chomsky, following Bar-Hillel and others [13, p 3781, points out the difficulty of explaining the role of the word "respectively," as used in (b) below, by any Phrase-Structure Grammar approach:

(a) Tom and Dick went to the store and the movies.
(b) Tom and Dick went to the store and the movies, respectively.

The key to the difficulty, we suggest, is that "respectively" in (b) is essentially a modal, not a notational element; like a pseudo op in an assembly language, it is not so much input to the translation process as it is an adjustment of the translator, forcing it to operate temporarily in a special mode.