Every program should be a new challenge. Almost by definition, any boring, mechanical chore can be done by the computer itself. Yet in practice we often find ourselves writing similar code again and again. One approach to automating the dull parts is to describe the interesting parts in an appropriate language and supply an interpreter to fill in the details. Actually, as we'll see, there's no sharp line between 'ordinary' programming and using an embedded language. Interpreters and compilers have a reputation for hairiness, but while they make a deep subject there's no reason they can't be part of every programmer's repertoire. Much of their traditional complexity merely reflects the complications of old-fashioned imperative languages, like Pascal and Basic, and of the machine languages that are the target of translation; some of the remaining complexity comes from efforts to push performance. Neither problem need bedevil application-specific or embedded languages: you as the designer have control over your language's structure, and it's rarely worthwhile to tune its implementation to the degree of, say, a commercial C compiler. You may object that a powerful language has to be complex. Not so: power comes not from collecting every feature of imaginable use, but from abstractions that fit together and accurately model the problem domain. Interpreters want to be simple beasts that just react to the complexity of their input. The book is more about an approach to programming than about particular kinds of languages. The philosophy may be exaggerated as "Software design is language design." It recognizes no sharp distinctions between languages and libraries, between libraries and programs. It denies Gries's dictum, "Program *into* a language, not *in* it", substituting "Program in the *right* language; if it doesn't exist, create it." Gries was right to refuse to take any programming language's concepts as a universally appropriate base for thinking about design, he just didn't go beyond that to consider domain-specific languages (and very-high-level languages, sort of). Needless to say, the book is vaporware at the moment. That is, if you directly follow Gries's advice, you first think about the problem in whatever terms seem appropriate to its domain (good), then translate your solution into the terms of your programming language (bad). That's bad because it's more readable, safe, and maintainable if the machine does the grunt work of the translation. This usually can be arranged. Re 'safe': e.g., even though a compiler is a complex program, it's generally easier to write correct programs using a compiler than directly in machine code. Also, a bug in a language processor *tends* to show up immediately, while in a big laborious hand-coding job it's easy to get just one case wrong. You know what I mean... This argument here needs more work. I think in some cases it's safer, in some cases less. Which is which? Re 'this usually can be arranged': when it can't, the possible reasons are impossibility (the source language cannot reasonably be made precise, or the translation is uncomputable or computationally intractable) or impracticality (the translator itself would be excessively complex). It's usually possible when the natural terms for a computational solution are different enough from those of the target language (thus we have something to compute, and it's computable); and it's practical when the problem to be solved is large enough (because language implementation is a one-time cost). Education and better tools should increase the number of cases for which it's practical. The metalinguistic approach has something to say at many levels, from near the bottom (factoring of common code, programming with data structures -- also called table-driven programming) to the top (domain languages like Mathematica). (A framework that treats such a broad range of stuff as the same had better be able to supply *some* distinctions... Well, for a start, here are some topics amenable to precise treatment: type systems, value structures, expressiveness, safety, soundness. And maybe: simplicity, uniformity, manipulability, flavors of syntax. Probably not, but worth thinking about: suggestiveness.) The inventor's paradox goes beyond my first explanation, that sometimes the restrictions in the problem statement are the most complicated part of it: the more general solution can be easier to find because it *factors* the problem space. The hard part is, you have to understand the space very well to find that decomposition. This idea of factoring seems to be a common thread here... (it was implicit in my mention of it being safer to use a compiler than program in machine language, too.) Factoring reduces a system to the direct product of independent parts. Abstraction focuses on one part, ignoring the others. Encapsulation enforces an abstraction barrier. Can you think of a means of encapsulation that doesn't imply a means of abstraction? further reading: Jon Bentley, _Programming Pearls_: "Data Structures Algorithms". Bentley, _More Programming Pearls_: "Little Languages", "A Survey of Surveys". SICP, EoPL, On Lisp Programming languages are central to the programming enterprise because: 1. The essence of program design is finding the right abstractions. 2. Any given language won't always let you express the right abstractions directly. 2a. Programs written directly in terms of the 'right' abstractions are better: easier to write and read, and more maintainable. 3. It's however possible to *create* such a language. 4. This reasoning isn't airtight... Part 3 has at least two subparts: 1. For one particular program. 2. For programs in a particular domain. These may not be as distinct as they at first appear. (since programs are made of subprograms) That was the argument from expressiveness. There are other angles to look at this from: for one, you can say that any interface that's meant to allow specification of complex behavior should be considered linguistically. For another, related view, there's the idea of 'open architecture' programs -- that says some programs *should* "allow specification of complex behavior" rather than trying to build everything in. This needs some more precise characterization of "complex behavior", for our purposes -- e.g., playing checkers is moderately complex, but since it's just one thing we gain nothing by viewing the problem, at its top level, as linguistic. Actually, there are at least three kinds of program: the monolithic app, the open-architecture app, and the tool. The point in favor of tools is the duplication of effort (in both creation and use) in having extension languages for each app, as well as the OS's scripting language. This argument presupposes that we have an all-purpose means of combination and abstraction. (E.g., in Unix there's the file as stream of bytes, and the shell script.) Interface design is like language design. Therefore, interface designers should know principles of language design. The difference is often one of indirection -- direct manipulation vs. specifying a process. Let's get more specific about the relations between interpretation, compilation, partial evaluation, and optimization. Here are 4 types of syntax -- I'm sure there are more: (statement, expression, function, relation)-oriented. A truly relation-oriented syntax is hard to do in text, since there are multiple 'inputs' and 'outputs' -- Prolog is sort of like an expression-oriented syntax with named connections (though the arguments can be arbitrary structures, not just variables... hm..). Expression syntax is usually preferable to statement syntax, because a statement-oriented language generally still has to have expressions, including expressions with side-effects, leading to syntactic nonuniformity; while an expression language can handle statements without any fuss. A statement language doesn't *have* to have expressions -- assembly language doesn't. (Constant expressions don't count...) Expressions allow data to be produced and used without needing a name. That was Fortran's major advance in expressiveness over fancy assemblers and floating-point interpreters. However, it didn't supply that advance uniformly (there's that word again!). Um, what was my point? I guess that the same advantage accrues in the same way for the parts left out by statement-oriented syntax (show this). Of course, Fortran wasn't designed from any high-class language-theorist point of view. In fact, the motivation for expressions was to support more- or-less familiar math formulas (math notation and Fortran *are* different, even if we programmers tend to forget it). Maybe I should take a look at MathCad. A statement language can sometimes be extended to an expression language by straightforward generalization. A pitfall of uniform syntax is less margin for error. Static typechecking can help shore that up, as can lint-pickers. I think the Bourne shell has expression-oriented syntax (with the result being the exit status). It seems to be uniform that way. So it should make a good example for non-Lispers. Look in Kernighan & Pike? Though it's *also* a function-composition language, because of pipes! This could be interesting. Icon is another example of an expression language with a heavy reliance on side effects. Uniformity and standard interfaces are really, really important. Can you distill some specific principles about them -- when to treat things the same way, when not to, warning signs that you're missing something, etc.? Of course, there's a lot of lore from the software-engineering viewpoint on this. I guess the difference is, the problem is more abstract in linguistic design -- but that also means you can bring more experience to bear on it. Languages have broader application than 'application' programs, and so more people bang on them. Forth is 'unreadable' for 3 reasons: unfamiliarity, lack of parentheses, and its function-composition semantics. A limited form of local variable may be the general solution to the third problem, for all function languages. (But: with heavy use of locals, it becomes like a statement language, doesn't it?) The unreadability of an unfamiliar notation can be exacerbated by ignorance of the 'right' way to format programs for that notation. Lisp and Forth are good examples. It may be a good idea to indent Forth from the *right*! (Okay, so there's no one 'right' format, but there are definitely wrong ones.) (And why are ML and imperative languages indented differently?) Also, of course, your own programs in the unfamiliar notation are going to be even less readable than good U.N. programs, which can give you an even worse impression. That's not necessarily so bad -- if it doesn't bring significant advantages, it's not worth significant retraining. In my case, I'm interested in languages for their own sake. Finally, there's a tendency to blame the language instead of the programmer for good or bad code, when the language is unusual. This is bigotry. With several examples now, I think there must be a general theory of function-composition languages. Use combinatory logic? Back to the unreadability of function composition compared to variables: what are the issues common to Forth & FP, then? FP doesn't have the unknown-arity problem, but it's still less readable than, say, Lisp. (See the quicksort function, e.g.) With variables (and without assignment) you don't have to follow a value through a set of transformations; you just refer back to its binding. Also, it's easier to make some kinds of changes to code with variables, for about the same reason. (A variable is kind of an automatic way of generating the right combinator.) On the other hand, it seems to be easier to do *automatic* transformations of programs in function- composition form. Investigate referential transparency in Forth & FP. It's not quite like in the lambda calculus because there aren't any variables. Duh... The difference is, I think, that in a combinator language an expression must always denote the same *function*, while in a lambda-calculus language it must always denote the same *value*. Gee, that was simple. Well, wait a sec -- that'd mean Forth is referentially transparent unless you only look at the stack effect. Maybe r.t. isn't that meaningful a concept. I suspect integers and reals are usually different static types for efficiency rather than safety. Counterexamples? Well, automatic truncation or rounding of floats to ints can cause unexpected results, as I found in my VB binary-search routine. But I wasn't even aware of the rule in that case. What does 'structured' programming amount to in a language without GOTOs, like Scheme or Prolog? Put another way, what's unstructured? Part of the answer might come from adapting the structured-design method to, e.g., Scheme. VB sux! They don't even do the obvious extensions of ideas in Basic to the visual controls -- e.g., compound controls would correspond to record types as control arrays (roughly) correspond to data arrays. (Can you use frame controls that way, at all?) Maybe I should write an article, "Why VB Is Not My Favorite Programming Language". Fish in a barrel, but I gotta start somewhere. Let's look more closely at Bentley's PIC example (in "Little Languages"). It would be just a library in Scheme, I'm sure. (How about in C++?) On the other hand, Ideal might not. The Make program illustrates some syntactic pitfalls. Devise a better syntax. Alternatives to Make? It can most naturally be viewed as a constraint-network language, or else memoized demand-driven dataflow. (Those views suggest generalizations of and alternatives to Make, don't they?) The obvious procedural approach to the problem (a batch file) sucks bigtime. What about OO? Logic? What if you have suitable restrictions on what you're making? Look at Matthias Blume's SML compilation manager. Come up with a system to describe complex pipelined machines? You could think of it as a very low-level parallel language. How low a level does it have to be at? Can you design a language that's mostly upward-compatible with Basic, but a lot nicer? Should you? How about a translator from VB? Is TTL's implicit array mapping unsound? Pitfalls appear to fall in just 2 categories: 1. when you don't know (at design time) the rank of the arrays involved; and 2. when you do. Case 2 holds no surprises because you'll always design the arrays to match. Case 1 is also all right because the structures must match 'all the way'. (I was worried about coincidental matching of dimensions leading to a deeper mapping than was desired.) Now wait a minute, you do have a kind of discontinuity there, when you think of the rank of A decreasing while the rank of B increases: which one gets mapped over 'all the way' changes at the crossover point. Thus you may have strange behavior if you don't establish at design time that rank A <= rank B (or vice versa). This could also be a pitfall in a situation where you have a semi- invariant WHERE expression (e.g., x WHERE A, with only x varying) and you naively expect the meaning of the whole to be the 'same' for each subexpression x. Needs further analysis. I think the Scheme shell/db should keep Soft Scheme in mind -- you should be able to infer types in any reasonable script. Mine scsh for ideas. Give an example showing how easy it is to pervert a language with a plausible extension. Spring it on the reader without warning, to rub it in. There's a common conception that the invention of high-level languages gave us a one-time boost in productivity, and that was it -- further major improvements have to come from fancy environments or AI or design practices or formal methods or something. (Not to sneer at any of those...) Brooks sez in "No Silver Bullet" that most of the problem today is intrinsic. Can you really make a clean separation between essence and accident in software design? I'm not convinced he's wrong; maybe I'm wasting my time... I had a list of string-processing languages to compare. To it, let's add a functional language with list comprehensions, and a visual railroad-track- grammars language. TTL: consider how a tabulation language would handle the CSR application. Also, TTL is reminiscent of the relational algebra; is there any corresponding calculus? [Answer: yes.] And finally, what would the original assignment, those 20 pages of reports, have looked like written in bare C? How much have we really gained? It's embarrassing that I haven't really looked at that. Earlier I said GUI programming in regular ole imperative languages suffers because control structures get eviscerated by event loops. There's another, similar problem in some systems, like Windows in C: data structures can also get eviscerated because the type system is too weak -- hence void *'s and handles. Is that perception accurate? I'm hardly an expert. The best systems are written in themselves, because they undergo more evolution as they're designed. The designer/implementor comes face-to-face every day with the system's shortcomings. In the case of an embeddable language, however, there are two senses in which it can be written in itself: the parts can be written either in the implemented language or in the implementing language (as long as they don't often use private parts of the implementation, sticking to facilities available to clients). The first stress-tests the language itself, and the second its embeddability. It might make sense to consider these factors in an implementation strategy: if you're after the highest-quality product, develop using the aspect that's least understood, and contrariwise if you're after a solid delivery date. Just as Windows exists because of DOS because of CP/M because of... perhaps standard programming *styles* still owe too much to a small-machine mindset, and that's why linguistic ideas aren't exploited more. No, on second thought interpretive techniques tend to *save* space, though of course not time. Even so, handcrafting of code was more important back then. Name for new pedagogical language? How about Exel -- extensible extension language. Or Evel (I like the name better; needs an acronym). Better yet, SUX -- Simple, Uniform, Extensible. Or itsux if I can come up with an acronym or other excuse. :-) Don't forget the atlast/Until controversy! :) And whether to send in a notice to the Language List maintainer. Ask for more thoughts on economics of open-architecture systems like Autocad? But it's probably in the Autodesk File. Yes, read that first, *then* approach Walker. The book could use an example of generating more efficient assembly code than any general-purpose compiler could, for a particular problem (which is too big to hand-code). Any ideas? Modules that try to do the convenient thing based on some external state can be evil. Examples: state-smart words in Forth, ls | p in Unix. Same thing. "The Unix way is based on an understanding of the evolutionary dynamics of complex systems." Discuss. Well, for one thing I'd say too many Unixers are too willing to just hack up a workaround for a problem rather than really fix it, leading to all those horrid long-term large-scale consequences. Nevertheless, I think there's a lot of truth to the thesis: a useful complex system evolves from modest beginnings, exploits parts that are independently useful, etc. Maybe review "The Architecture of Complexity" in Simon. Someone argued that Lisp's easy-to-parse syntax isn't important because you only write the compiler once, and parsers are no big deal anyway. However, making things easy for the compiler isn't our main goal, which is easy manipulation of programs in general. Uniform syntax for code and data lets us put off the decision whether to implement some subsystem by code or data or some mixture of the two; more important still are convenient, powerful macros. This whole argument is just a special case of the idea that automation benefits from simple standard interfaces. By working within a restricted framework, we gain flexibility in combining parts. ``Freedom is slavery.'' Hey, ``Ignorance is strength'' makes another good aphorism. I don't see where ``War is peace'' comes in, though. Simple standard interfaces ease automation just because every idiosyncrasy must be dealt with. Unless you're lucky, it must be dealt with over and over in different guises, yielding a complexity explosion as systems get larger. (Examples galore.) (``Syntactic sugar causes cancer of the semicolon.'') > When Mike Colishaw chatted with me for Dr. Dobbs (will appear > 1Q96) he pointed out that the high-level theoretical models of languages > frequently have funny holes in them, whereas in his opportunistic syntax > for Rexx, this sort of thing was less likely, since he always bent the > model in favor of the user. I have several problems with this. First, that approach leads to the kind of systems that make me gag -- csh, perl, rexx. Even C++, though it isn't quite as bad as the other three. They make me gag because they give a complicated and ill-defined base for building programs. I really think these abortions are so popular for the same reason Piers Anthony is. It's sad. ... Second, as Jef pointed out, ``this sort of thing'' can't be blamed on theory in this case -- in fact, it's more characteristic of a hacked-up design. Csh, perl, rexx on the one hand, Standard ML on the other. See? To be fair, I've hardly even touched any of the three, figuring by extrapolation that the more bogus a language is, the more I'll hate it; but maybe a *completely* bogus language is really a joy to use... and Intercal would rule the world... I think a ``high-level theoretical model'' will tend to have this sort of problem only if its creator has a B&D sort of attitude, like in Pascal or Ada. Though there's also the Scheme syndrome of never completing a practical spec. The problems I'm having coming up with a satisfactory string-processing language show the importance of theory. Automata theory and complexity theory for knowing what sort of things are practically computable (among other good reasons), and logic and type theory for designing genuinely declarative languages. >In a way, C and PERL go against the principles of unix - they are general >purpose tools which are reasonably good for anything but excel at nothing. I think the way they go against the Unix way, to the extent they do, is by replacing the *glue*, not the tools. Shell & pipes aren't always the best kind of glue. What is it about string-rewriting languages that makes people seem to like them for 'scripting'? Do people just have trouble thinking about a distinction between concrete and abstract syntax? Or what? And Javascript seems to have an abstract syntax anyway. Dumb question: exactly what's the difference between objects and first-class modules? Forth & introductory CS: Forth got me really interested in programming languages as objects of study, as well as the metalinguistic programming style. Scheme may be better for both...