Tag Archives: Overview

Essays regarding an entire field of study.

Pan Fried Programming

(Here's the update -- nothing much is new:
MLPTK: http://www.mediafire.com/file/m3u25i445lqkztb/mlptk-2016-12-16.zip
Source Highlight: http://www.mediafire.com/file/ygxb14ie94cwcuy/mlptk-src-hilite-2016-12-16.zip
Book: http://www.mediafire.com/file/vg439qruq3do90q/mlptk-book-2016-12-16.zip

Remedial (adj.): intended to rectify, amend, heal.
Panacea (n., myth): goddess of healing, daughter of Aesculapius.
Pansear (n.): Chili's Pokémon.

This remedial lecture will tersely cover a semester's curriculum,
similar to what you have learnt in your high school algebra class,
comprising the fundamentals of programming with synthetic languages
(those that are above machine code).

If you don't know what computer programming is, I would recommend that you study
some tutorials & encyclopedia articles. Much is available on the WWW (Worldwide
Web). The Web is a part of the Internet, and it is the Web you access from your
Web browser when you navigate to a Web page. You could also try one'a them there
"<name of programming language> For Dummies" textbooks: the "For Dummies" books
are excellent "Cliff's Notes"-style crash courses, and each aims to be similar
to a "101" course in the topic advertised.

To make a beginning with any programming language, all you must know is that a
computer computes: your instructions, issued in the program you write, tell the
machine how to progress from its input or initial state to a resultant output or
final state. These instructions look different in different languages -- some
languages require more or fewer -- but every computer program is an algorithm,
or "recipe for computation."

Computers and computer programs can be characterized as finite state automata.
They're like assembly-line robots who know where to weld each sheet of metal.
Also like assembly-line robots, they are "blind" in the sense that they'll kill
you with the soldering iron should you step between it and the sheet.
Computing machines do what they're told, even when it is particularly stupid:
that's why computer viruses, worms, and computer espionage exist.

In simplest terms, the computer's processing units receive some numbers and an
instruction that says what mathematical operation to execute, then operates:
like a calculator. High-level programming languages are more synthetic, like a
human language is, and comprise such ideas as objects (amalgamations of data) &
functions (modular sub-routines). Compilers or interpreters read these languages
and translate them into machine instructions, simplifying the lengthy series of
instructions necessary to make the calculator execute these difficult tasks.

In a high-level language, there are few technical concerns.
You can begin immediately with the abstract concepts.
Here are some:

As in algebra, a variable is a name that represents a value.
As in solving a system of equations, values are typically assigned to some vars
and the value of the other variables is computed using the values given.
For example, in Javascript:
    var a = 2;
    var b = a + 2;
The variable <b> is now equal to 2 + 2. Similar operations function similarly.
In Javascript and other very-high-level languages, variables aren't only scalars
and can point at any object. They're like placeholders for procedure.
Although "variable" implies a value stored in memory, and "identifier" only its
mnemonic, the words "variable" & "identifier" used loosely mean about the same.
    "Just don't try that with the Captain."
        -- Geordi LaForge, to Data, _Star Trek: the Next Generation._

These are important ideas that are abstracted away in VHLLs. A pointer stores an
address in memory, for a later indirect read/write or similar operation. In HLLs
a pointer/reference accesses an object directly instead of copying its value.
You'll rarely have to make the distinction in Javascript; but, for example:
    var a = new Array(1, 2, 3); // a[0] == 1, a[1] == 2, a[2] == 3
    var b = a; // Incidentally, b === a, and that is why in the next line...
    b[0] = 555; // ... b[0] == 555, and now a[0] also == 555!
As opposed to:
    var c = new Array(3); // c is a new array of length 3
    c[0] = b[0]; c[1] = b[1]; c[2] = b[2]; // copy scalar values one-by-one
    c[0] = 0; // c[0] == 0, but b[0] remains == a[0], which remains == 555.
    var d = 2;
    var e = d;
    e = 4; // e == 4, but d is still == 2.
As you can see, operating on an object (such as via array subscript operation)
changes the object, even if the object is pointed by multiple variables.
Likewise, objects passed as the argument of a function are passed by reference:
they aren't simply copied, and operating on the argument within the function is
equivalent to changing the object, whose scope is above that of the function.
Some high-level languages, like C, permit you to explicitly specify what is a
pointer or reference, which eliminates some of this confusion but requires more
exacting attention to detail in your design specification.

The state of a program is the value of all its variables, the current location
within the instruction set, and the environment of the operating system (or the
interpreter). In Javascript, within Web browsers, the browser typically provides
access to some of its state via the Document Object Model.

Heuristics, or "guesswork," could not exist if there were no way to execute some
different code depending on the state of the program. Furthermore there are some
mathematics you can't write as exactly one set of instructions that produces one
semantic value: for instance, a function defined only on an interval, or an even
root of a positive number. In this circumstance, you are writing branches:
    if (5 > 10) { /* of course, the code in this block never happens. */ }
    else if (2 < 0) { /* neither does this, b/c cond'n is also false. */ }
    else { /* but all of this code happens, because the others didn't. */ }
... One of the branches executes, and the others don't.
The part in parentheses is the "conditional statement:" it's evaluated as either
"true" or "false," like in Boolean logic. 

Identifiers are only valid within the block (curly brackets, or { ... }) where
they were declared. Well, they're supposed to, anyway. Therefore, if you declare
a variable inside a function, you can't use it outside of the function or within
another function. Why would you want to, anyway? The next time you invoked the
function, the value of the variables you were using in there would change again.

Computers are great at repetition. Loops repeat a set of instructions: they are
typically written as a prefix, conditional, postfix, and body. For example:
    for (var T = 10; T > 0; T--) { alert("T minus " + T); }
... which counts down from ten to one with annoying alert popups.
While or do-while loops have only conditions & bodies.
A loop is an example of an "iterative algorithm." Each time the loop's body is
executed, it's called an "iteration." In computing fractal geometric patterns,
"iteration" means more like "recursion:" which, see below.

A function is a modular segment of your program: a sequence of computation that
is repeated a few times, or can be reused as part of another computation.
Functions are "invoked," or called by name, with values supplied as arguments,
and return a value, similarly to how functions behave in algebra. When declaring
a function, you'd typically write the name of the function followed by its args
in parentheses and then the function body. For example, again in Javascript:
    function intp (N) { return (N % 1) == 0; } // integer predicate
... which returns true if N is probably an integer, or whole number:
    if (intp(5)) { alert("Yes. 5 is probably an integer."); }
    if (intp(5.55)) { alert("This box never appears..."); }
    else { alert("... but this one does, because 5.55 is a floater."); }
(Floating-point numbers are inaccurate, in Javascript as likewise elsewhere.)

A function that invokes itself is a recursive function. Any function invoking an
other function, which subsequently causes the original function to be invoked
again, causes a recursion-like situation that I think is called "re-entrancy."
It is essential to note that _any_ and _every_ recursive function you can write
for a computer to execute can be rewritten as an iterative algorithm. The proof
of this is complex: it follows from Alan Turing's model of finite state automata
and the read-execute model of arithmetic and logic units (CPUs), and basically
asserts that you'd never be able to execute recursion if you couldn't do it by
reading one instruction at a time. In other words, each time your function calls
itself again, it's simply storing its state in memory temporarily while the
machine executes a fresh copy: after the copy is done, the former state is re-
loaded, and execution proceeds from there. This is achieved with stacks: data
structures that grow in size as more is "pushed" onto them, and shrink when some
is "popped" off of the top.

An object is a collection of data that comprises a several datum. That is, when
data are related to one another, they can be envisioned as a "shape" or "motion"
that is the sum of its parts. For example, a vector has direction and magnitude;
an individual has a first and last name; a parser has an input stream, a stack,
and a procedure. In Javascript, you'd write something like this:
    function Blah (newz) { if (newz) { this.z = newz; } return this; }
    Blah.prototype = new Object();
    Blah.prototype.z = 555;
    Blah.prototype.tell_me_z = function () { return this.z; }
    var a = new Blah(222), b = new Blah(); // a.z == 222; b.z = 555.
... syntax varies among languages. Essentially an object is a data structure
containing some members ("variables" attached to the object, like Blah::z above)
and, if the object is a class, some methods (functions, like ::tell_me_z).

Overview of LR Parsers.

Hi. I'm Troy McClure.

... Oh, wait. He was a character on The Simpsons. Who was I, again?

Ah. I remember now. I'm Thor King: the author of MLPTK.
If you'd like to acquaint yourself with my work, you can find it @ the following Mediafire URL for as long as Mediafire will host me and my download cap holds:

(elder archive removed to make room for the new one. See top of next post at https://ipduck.wordpress.com/2016/02/16/of-stack-state/)

If you can't access it, then write to me at 1433 Flannigan Creek Road, Viola,
ID 83872, and I'll endeavor to send you a CD or hard copy. Please do not send
money; parcels sometimes "mysteriously" do not arrive at my mailbox. Or simply
wait for the next time I refresh the archive at Mediafire. That's likely to be
sooner than I could send a copy by postal mail, but I'll try if you want.

Unlike my prior development snapshots of my software, some parental discretion
is advised for this one because of the pop-up graphic when the user moves his
or her mouse pointer over the Hiragana "Fu" near the top of the page.
If your parents aren't discrete, then how in blazes are there two of them?

My lecture today shall advise you in regard to the theory & practice of writing
LALR (lookahead, left-to-right) parsers. This problem set is essentially
solved by the excellent Flex and Bison (or Lex and Yacc), which are part of the
Free Software Foundation's GNU toolset & available on POSIX-compliant operating
systems such as Linux, Unix, and MacOS 10 and greater; these are written in the
C programming language, but you can write your own easily in any language once
you understand the mathematical principles that underpin the technology.

Please be aware that I might not accurately understand some of the theory.
Seek other sources if you're really interested.

A parser is a computer program that executes instructions corresponding to its
input. That is, it is a reprogrammable program, or a program that writes other
programs. Parsers are ubiquitous throughout the field of computer science, and
necessary in the computation of any algorithm that must alter its behavior
significantly over a large input domain: for example, programming languages,
network protocols, and interprocess communication. A reliable parser, such as
the GNU utilities I mentioned above, is mathematically verified and saves a lot
of debugging, especially when an existing schematic must be altered.

Parsers are finite state automata: the fundamental theorem of computer science.
For more information, see "Turing machine" at Wikipedia. Also of interest are
CPU chip set architecture, circuitry, logical and memory gates, MOSFETs (metal
oxide semiconductor ferroelectric transistors), Boolean logic, assembly code &
instruction sets (especially Intel 80x86 and extensions), lexers / scanners,
GLR parsers, Regular Expressions, POSIX & the Single Unix Specification, and any
patents in regard to computer technology that you can dig up at a patent office.

The phrase "computer programming" implies that the programmer literally issues
instructions directly to the machine. Indeed, parsers can do this for you, too;
but compiler design is beyond the scope of my comprehension as yet. Instead, I
shall treat of how to write an interpreter generator. Yet again and again: the
necessary work _has_ already been done for you, via Flex and Bison. I'll write
as though you're designing a similar kind of parser, so that you know exactly
how much work you can save yourself by using the industry standard technology.

An interpreter executes high-level instructions as the input is being consumed,
unlike a compiler which assembles machine instructions to be executed later. A
machine instruction is a sequence of electrical impulses, usually represented as
binary code, that specifies the operation of a logic circuit such as a CPU. The
x86 instruction set; which operated the Intel x86, Pentium, and Pentium II; was
widely in use for decades. Windows included a tool named 'debug', for use within
the MS-DOS prompt, that permitted the owner of the operating system to write in
this language and execute the results without installing anything extra.
In 2016, x86 remains a common instruction set.

The reason to write an interpreter is basically: to build a data structure.
A Bison-like parser is not necessary to achieve this end, but can significantly
reduce the time necessary to specify the protocol or structure format. If you'd
imagine the task of programming to be similar to the art of illustration, then
your program iterates through several design phases: alpha (storyboard sketch),
beta (concept sketch), maintenance (outline inking), more maintenance (shading),
even more maintenance (coloring), and finished work (touch-ups etc), throughout
a duration spanning some years. Before you commit to any such tedious procedure,
you might like to gather some idea of how the finished work will look; that is
what sketching is all about, and similarly shading the inked illustration. The
right parser can get your "sketch" on "canvas" with much greater efficiency.

The standard parser technology is the LR parser; a finite state automaton that
describes how to read data from the beginning, combine tokens (pieces of data)
into sequences as they are encountered, combine the sequences to progressively
more complex sequences by computing the semantic value of each sequence and
representing that value as a token, then finally arrive at a solitary token that
signifies a complete program or data structure. The canonical example is Bison.
The full specification of the Bison Parser Algorithm (©) is available in texinfo
format from the Ubuntu repository, from the Free Software Foundation, and from
any of the many fine establishments where GNU is served daily.

LR parsers read tokens one at a time, either shifting them onto a parse stack or
reducing a sequence on the stack from multiple tokens to one. When a sequence is
reduced, a new semantic value is computed and stored in the token it reduced to.
	1. Read one token, which is the lookahead token; the former lookahead token
	   becomes the working token.
	2. If the lookahead token is an operator with higher precedence than the
	   last operator encountered, start a new sequence.
	3. If the working token proceeds from the foregoing sequence, shift it onto
	   the sequence in progress.
	4. Otherwise, if the sequence reduces, then pop the sequence and unshift the
	   reduced token back onto the input stream.
	5. Otherwise, block the sequence in progress and wait for a new sequence at
	   the working token to reduce to the token the blocked sequence expects.

Sort of. In order to know how the sequences progress, it's necessary to encode
a data structure that maps the steps between beginning and end. To do this, you
must write a parser that parses the rules that tell the LR parser what to do.
Technically Bison does this too, although the data structure that represents the
parsed rules is later discarded, because the "structure" is flattened-out into a
block of static code. You aren't writing your parser anew each time you make an
alteration to its grammar; the parser generator is writing the parser, given the
rules you specified in the grammar; so you have to write something that converts
the rules (which, for your sanity's sake, were written in something approaching
a user-friendly language) to a data structure that encodes the parser state at
each possible step through its input sequence.

This can be accomplished by writing a tree / treelike structure, where each node
contains forward links to the next parser state(s) and these links are keyed on
the identifier of the token that was read-in next from that state. Then the data
structure is flattened into static code; or, in my case, a flatter structure.
The parser's state map describes what happens when the next token is consumed;
mapping each possible configuration of the automaton's state data hasn't to do
with instructions executed while it operates; it is motive, not motion.

Then write a tokenizer, which converts the input data to a set of named tokens.
Really this can be made part of the parsing step; but it costs practically no
time to specify the tokenizer separately, which is modular and aids maintenance.
A tokenizer can be a scanner, like Flex, which recognizes patterns in data. Or
it could be any algorithm whose output conforms to the parser's input format. If
the data is already tokenized, or formed in such manner as the parser mustn't
need any assistance to understand it, tokenization is unnecessary; although you
might reformat the data slightly or alter some tokens before parsing them.

Tokenize the input and feed it to the parser which executes the above algorithm.
During the parse phase, additional computations are performed by the subroutines
you specified to compute the semantic values of token reductions. Afterward, you
have either executed your program or built a data structure to execute it later.

Now that you have sunk the foundation of the idea, let's cover it in concrete.

Using MLPTK as a debugging console, I have written a manuscript that I named
"Quadrare Lexema," which reads "to square the words." In fact, because the QL
library shall eventually form the backbone of a major overhaul of MLPTK's CLI,
and because the sacral pelvic articulation does conjoin the dorsal vertebrae,
my new experiment is the demonstrative proof that it is hip to be square.

If you would kindly accept my profuse contrition for spending an entire lecture
to set up that joke; I shall, throughout the next few posts, walk you through QL
and explain more-or-less exactly what it is doing. I plan to write them by June.