Hi. I'm Troy McClure. ... Oh, wait. He was a character on The Simpsons. Who was I, again? Ah. I remember now. I'm Thor King: the author of MLPTK. If you'd like to acquaint yourself with my work, you can find it @ the following Mediafire URL for as long as Mediafire will host me and my download cap holds: (elder archive removed to make room for the new one. See top of next post at https://ipduck.wordpress.com/2016/02/16/of-stack-state/) If you can't access it, then write to me at 1433 Flannigan Creek Road, Viola, ID 83872, and I'll endeavor to send you a CD or hard copy. Please do not send money; parcels sometimes "mysteriously" do not arrive at my mailbox. Or simply wait for the next time I refresh the archive at Mediafire. That's likely to be sooner than I could send a copy by postal mail, but I'll try if you want. Unlike my prior development snapshots of my software, some parental discretion is advised for this one because of the pop-up graphic when the user moves his or her mouse pointer over the Hiragana "Fu" near the top of the page. If your parents aren't discrete, then how in blazes are there two of them? My lecture today shall advise you in regard to the theory & practice of writing LALR (lookahead, left-to-right) parsers. This problem set is essentially solved by the excellent Flex and Bison (or Lex and Yacc), which are part of the Free Software Foundation's GNU toolset & available on POSIX-compliant operating systems such as Linux, Unix, and MacOS 10 and greater; these are written in the C programming language, but you can write your own easily in any language once you understand the mathematical principles that underpin the technology. Please be aware that I might not accurately understand some of the theory. Seek other sources if you're really interested. A parser is a computer program that executes instructions corresponding to its input. That is, it is a reprogrammable program, or a program that writes other programs. Parsers are ubiquitous throughout the field of computer science, and necessary in the computation of any algorithm that must alter its behavior significantly over a large input domain: for example, programming languages, network protocols, and interprocess communication. A reliable parser, such as the GNU utilities I mentioned above, is mathematically verified and saves a lot of debugging, especially when an existing schematic must be altered. Parsers are finite state automata: the fundamental theorem of computer science. For more information, see "Turing machine" at Wikipedia. Also of interest are CPU chip set architecture, circuitry, logical and memory gates, MOSFETs (metal oxide semiconductor ferroelectric transistors), Boolean logic, assembly code & instruction sets (especially Intel 80x86 and extensions), lexers / scanners, GLR parsers, Regular Expressions, POSIX & the Single Unix Specification, and any patents in regard to computer technology that you can dig up at a patent office. The phrase "computer programming" implies that the programmer literally issues instructions directly to the machine. Indeed, parsers can do this for you, too; but compiler design is beyond the scope of my comprehension as yet. Instead, I shall treat of how to write an interpreter generator. Yet again and again: the necessary work _has_ already been done for you, via Flex and Bison. I'll write as though you're designing a similar kind of parser, so that you know exactly how much work you can save yourself by using the industry standard technology. An interpreter executes high-level instructions as the input is being consumed, unlike a compiler which assembles machine instructions to be executed later. A machine instruction is a sequence of electrical impulses, usually represented as binary code, that specifies the operation of a logic circuit such as a CPU. The x86 instruction set; which operated the Intel x86, Pentium, and Pentium II; was widely in use for decades. Windows included a tool named 'debug', for use within the MS-DOS prompt, that permitted the owner of the operating system to write in this language and execute the results without installing anything extra. In 2016, x86 remains a common instruction set. The reason to write an interpreter is basically: to build a data structure. A Bison-like parser is not necessary to achieve this end, but can significantly reduce the time necessary to specify the protocol or structure format. If you'd imagine the task of programming to be similar to the art of illustration, then your program iterates through several design phases: alpha (storyboard sketch), beta (concept sketch), maintenance (outline inking), more maintenance (shading), even more maintenance (coloring), and finished work (touch-ups etc), throughout a duration spanning some years. Before you commit to any such tedious procedure, you might like to gather some idea of how the finished work will look; that is what sketching is all about, and similarly shading the inked illustration. The right parser can get your "sketch" on "canvas" with much greater efficiency. The standard parser technology is the LR parser; a finite state automaton that describes how to read data from the beginning, combine tokens (pieces of data) into sequences as they are encountered, combine the sequences to progressively more complex sequences by computing the semantic value of each sequence and representing that value as a token, then finally arrive at a solitary token that signifies a complete program or data structure. The canonical example is Bison. The full specification of the Bison Parser Algorithm (Â©) is available in texinfo format from the Ubuntu repository, from the Free Software Foundation, and from any of the many fine establishments where GNU is served daily. LR parsers read tokens one at a time, either shifting them onto a parse stack or reducing a sequence on the stack from multiple tokens to one. When a sequence is reduced, a new semantic value is computed and stored in the token it reduced to. 1. Read one token, which is the lookahead token; the former lookahead token becomes the working token. 2. If the lookahead token is an operator with higher precedence than the last operator encountered, start a new sequence. 3. If the working token proceeds from the foregoing sequence, shift it onto the sequence in progress. 4. Otherwise, if the sequence reduces, then pop the sequence and unshift the reduced token back onto the input stream. 5. Otherwise, block the sequence in progress and wait for a new sequence at the working token to reduce to the token the blocked sequence expects. Simple. Sort of. In order to know how the sequences progress, it's necessary to encode a data structure that maps the steps between beginning and end. To do this, you must write a parser that parses the rules that tell the LR parser what to do. Technically Bison does this too, although the data structure that represents the parsed rules is later discarded, because the "structure" is flattened-out into a block of static code. You aren't writing your parser anew each time you make an alteration to its grammar; the parser generator is writing the parser, given the rules you specified in the grammar; so you have to write something that converts the rules (which, for your sanity's sake, were written in something approaching a user-friendly language) to a data structure that encodes the parser state at each possible step through its input sequence. This can be accomplished by writing a tree / treelike structure, where each node contains forward links to the next parser state(s) and these links are keyed on the identifier of the token that was read-in next from that state. Then the data structure is flattened into static code; or, in my case, a flatter structure. The parser's state map describes what happens when the next token is consumed; mapping each possible configuration of the automaton's state data hasn't to do with instructions executed while it operates; it is motive, not motion. Then write a tokenizer, which converts the input data to a set of named tokens. Really this can be made part of the parsing step; but it costs practically no time to specify the tokenizer separately, which is modular and aids maintenance. A tokenizer can be a scanner, like Flex, which recognizes patterns in data. Or it could be any algorithm whose output conforms to the parser's input format. If the data is already tokenized, or formed in such manner as the parser mustn't need any assistance to understand it, tokenization is unnecessary; although you might reformat the data slightly or alter some tokens before parsing them. Tokenize the input and feed it to the parser which executes the above algorithm. During the parse phase, additional computations are performed by the subroutines you specified to compute the semantic values of token reductions. Afterward, you have either executed your program or built a data structure to execute it later. Now that you have sunk the foundation of the idea, let's cover it in concrete. Using MLPTK as a debugging console, I have written a manuscript that I named "Quadrare Lexema," which reads "to square the words." In fact, because the QL library shall eventually form the backbone of a major overhaul of MLPTK's CLI, and because the sacral pelvic articulation does conjoin the dorsal vertebrae, my new experiment is the demonstrative proof that it is hip to be square. If you would kindly accept my profuse contrition for spending an entire lecture to set up that joke; I shall, throughout the next few posts, walk you through QL and explain more-or-less exactly what it is doing. I plan to write them by June.