Tag Archives: Theory

Essays regarding the abstract concept of a technical subject.

Pan Fried Programming

(Here's the update -- nothing much is new:
MLPTK: http://www.mediafire.com/file/m3u25i445lqkztb/mlptk-2016-12-16.zip
Source Highlight: http://www.mediafire.com/file/ygxb14ie94cwcuy/mlptk-src-hilite-2016-12-16.zip
Book: http://www.mediafire.com/file/vg439qruq3do90q/mlptk-book-2016-12-16.zip

Remedial (adj.): intended to rectify, amend, heal.
Panacea (n., myth): goddess of healing, daughter of Aesculapius.
Pansear (n.): Chili's Pokémon.

This remedial lecture will tersely cover a semester's curriculum,
similar to what you have learnt in your high school algebra class,
comprising the fundamentals of programming with synthetic languages
(those that are above machine code).

If you don't know what computer programming is, I would recommend that you study
some tutorials & encyclopedia articles. Much is available on the WWW (Worldwide
Web). The Web is a part of the Internet, and it is the Web you access from your
Web browser when you navigate to a Web page. You could also try one'a them there
"<name of programming language> For Dummies" textbooks: the "For Dummies" books
are excellent "Cliff's Notes"-style crash courses, and each aims to be similar
to a "101" course in the topic advertised.

To make a beginning with any programming language, all you must know is that a
computer computes: your instructions, issued in the program you write, tell the
machine how to progress from its input or initial state to a resultant output or
final state. These instructions look different in different languages -- some
languages require more or fewer -- but every computer program is an algorithm,
or "recipe for computation."

Computers and computer programs can be characterized as finite state automata.
They're like assembly-line robots who know where to weld each sheet of metal.
Also like assembly-line robots, they are "blind" in the sense that they'll kill
you with the soldering iron should you step between it and the sheet.
Computing machines do what they're told, even when it is particularly stupid:
that's why computer viruses, worms, and computer espionage exist.

In simplest terms, the computer's processing units receive some numbers and an
instruction that says what mathematical operation to execute, then operates:
like a calculator. High-level programming languages are more synthetic, like a
human language is, and comprise such ideas as objects (amalgamations of data) &
functions (modular sub-routines). Compilers or interpreters read these languages
and translate them into machine instructions, simplifying the lengthy series of
instructions necessary to make the calculator execute these difficult tasks.

In a high-level language, there are few technical concerns.
You can begin immediately with the abstract concepts.
Here are some:

As in algebra, a variable is a name that represents a value.
As in solving a system of equations, values are typically assigned to some vars
and the value of the other variables is computed using the values given.
For example, in Javascript:
    var a = 2;
    var b = a + 2;
The variable <b> is now equal to 2 + 2. Similar operations function similarly.
In Javascript and other very-high-level languages, variables aren't only scalars
and can point at any object. They're like placeholders for procedure.
Although "variable" implies a value stored in memory, and "identifier" only its
mnemonic, the words "variable" & "identifier" used loosely mean about the same.
    "Just don't try that with the Captain."
        -- Geordi LaForge, to Data, _Star Trek: the Next Generation._

These are important ideas that are abstracted away in VHLLs. A pointer stores an
address in memory, for a later indirect read/write or similar operation. In HLLs
a pointer/reference accesses an object directly instead of copying its value.
You'll rarely have to make the distinction in Javascript; but, for example:
    var a = new Array(1, 2, 3); // a[0] == 1, a[1] == 2, a[2] == 3
    var b = a; // Incidentally, b === a, and that is why in the next line...
    b[0] = 555; // ... b[0] == 555, and now a[0] also == 555!
As opposed to:
    var c = new Array(3); // c is a new array of length 3
    c[0] = b[0]; c[1] = b[1]; c[2] = b[2]; // copy scalar values one-by-one
    c[0] = 0; // c[0] == 0, but b[0] remains == a[0], which remains == 555.
    var d = 2;
    var e = d;
    e = 4; // e == 4, but d is still == 2.
As you can see, operating on an object (such as via array subscript operation)
changes the object, even if the object is pointed by multiple variables.
Likewise, objects passed as the argument of a function are passed by reference:
they aren't simply copied, and operating on the argument within the function is
equivalent to changing the object, whose scope is above that of the function.
Some high-level languages, like C, permit you to explicitly specify what is a
pointer or reference, which eliminates some of this confusion but requires more
exacting attention to detail in your design specification.

The state of a program is the value of all its variables, the current location
within the instruction set, and the environment of the operating system (or the
interpreter). In Javascript, within Web browsers, the browser typically provides
access to some of its state via the Document Object Model.

Heuristics, or "guesswork," could not exist if there were no way to execute some
different code depending on the state of the program. Furthermore there are some
mathematics you can't write as exactly one set of instructions that produces one
semantic value: for instance, a function defined only on an interval, or an even
root of a positive number. In this circumstance, you are writing branches:
    if (5 > 10) { /* of course, the code in this block never happens. */ }
    else if (2 < 0) { /* neither does this, b/c cond'n is also false. */ }
    else { /* but all of this code happens, because the others didn't. */ }
... One of the branches executes, and the others don't.
The part in parentheses is the "conditional statement:" it's evaluated as either
"true" or "false," like in Boolean logic. 

Identifiers are only valid within the block (curly brackets, or { ... }) where
they were declared. Well, they're supposed to, anyway. Therefore, if you declare
a variable inside a function, you can't use it outside of the function or within
another function. Why would you want to, anyway? The next time you invoked the
function, the value of the variables you were using in there would change again.

Computers are great at repetition. Loops repeat a set of instructions: they are
typically written as a prefix, conditional, postfix, and body. For example:
    for (var T = 10; T > 0; T--) { alert("T minus " + T); }
... which counts down from ten to one with annoying alert popups.
While or do-while loops have only conditions & bodies.
A loop is an example of an "iterative algorithm." Each time the loop's body is
executed, it's called an "iteration." In computing fractal geometric patterns,
"iteration" means more like "recursion:" which, see below.

A function is a modular segment of your program: a sequence of computation that
is repeated a few times, or can be reused as part of another computation.
Functions are "invoked," or called by name, with values supplied as arguments,
and return a value, similarly to how functions behave in algebra. When declaring
a function, you'd typically write the name of the function followed by its args
in parentheses and then the function body. For example, again in Javascript:
    function intp (N) { return (N % 1) == 0; } // integer predicate
... which returns true if N is probably an integer, or whole number:
    if (intp(5)) { alert("Yes. 5 is probably an integer."); }
    if (intp(5.55)) { alert("This box never appears..."); }
    else { alert("... but this one does, because 5.55 is a floater."); }
(Floating-point numbers are inaccurate, in Javascript as likewise elsewhere.)

A function that invokes itself is a recursive function. Any function invoking an
other function, which subsequently causes the original function to be invoked
again, causes a recursion-like situation that I think is called "re-entrancy."
It is essential to note that _any_ and _every_ recursive function you can write
for a computer to execute can be rewritten as an iterative algorithm. The proof
of this is complex: it follows from Alan Turing's model of finite state automata
and the read-execute model of arithmetic and logic units (CPUs), and basically
asserts that you'd never be able to execute recursion if you couldn't do it by
reading one instruction at a time. In other words, each time your function calls
itself again, it's simply storing its state in memory temporarily while the
machine executes a fresh copy: after the copy is done, the former state is re-
loaded, and execution proceeds from there. This is achieved with stacks: data
structures that grow in size as more is "pushed" onto them, and shrink when some
is "popped" off of the top.

An object is a collection of data that comprises a several datum. That is, when
data are related to one another, they can be envisioned as a "shape" or "motion"
that is the sum of its parts. For example, a vector has direction and magnitude;
an individual has a first and last name; a parser has an input stream, a stack,
and a procedure. In Javascript, you'd write something like this:
    function Blah (newz) { if (newz) { this.z = newz; } return this; }
    Blah.prototype = new Object();
    Blah.prototype.z = 555;
    Blah.prototype.tell_me_z = function () { return this.z; }
    var a = new Blah(222), b = new Blah(); // a.z == 222; b.z = 555.
... syntax varies among languages. Essentially an object is a data structure
containing some members ("variables" attached to the object, like Blah::z above)
and, if the object is a class, some methods (functions, like ::tell_me_z).

Palling around.

Pall (n.): pawl.

I couldn't write last week, and my upgrade to QL has progressed no further.
For reference, I stalled before comparing the efficiency of nested Objects to
that of nested Arrays, which I must test before experimenting further with the
prototype compiler or even refining the design. I intend to do that this month.
In the meantime, here's a snapshot of MLPTK with new experiments included.


And a correction to my brief about the grammar ("Saddlebread"): actually, the
InchoateConjugation sequence does not cause differentiation, because the OP_CAT
prevents the original from reducing. Other parts may be inaccurate. I'll revise
the grammar brief and post a new one as soon as I have fixed the QL speed bug.

I took some time out from writing Quadrare Lexema to write some code I've been
meaning to write for a very long time: pal9000, the dissociated companion.
This software design is remarkably similar to the venerable "Eggdrop," whose C
source code is available for download at various locations on the Internets.
Obviously, my code is free and within the Public Domain (as open as open source
can be); you can find pal9000 bundled with today's edition of MLPTK, beneath the
/reference/ directory.

The chatbot is a hardy perennial computer program.
People sometimes say chatbots are artificial intelligence; although they aren't,
exactly, or at least this one isn't, because it doesn't know where it is or what
it's doing (actually it makes some assumptions about itself that are perfectly
wrong) and it doesn't apply the compiler-like technique of categorical learning
because I half-baked the project. Soon, though, I hope...

Nevertheless, mathematics allows us to simulate natural language.
Even a simplistic algorithm like Dissociated Press (see "Internet Jargon File,"
maintained somewhere on the World Wide Web, possibly at Thyrsus Enterprises by
Eric Steven Raymond) can produce humanoid phrases that are like real writing.
Where DisPress fails, naturally, is paragraphs and coherence: as you'll see when
you've researched, it loses track of what it was saying after a few words.

Of course, that can be alleviated with any number of clever tricks; such as:
	1. Use a compiler.
	2. Use a compiler.
	3. Use a compiler.
I haven't done that with p9k, yet, but you can if you want.

Of meaningful significance to chat robots is the Markov chain.
That is a mathematical model, used to describe some physical processes (such as
diffusion), describing a state machine in which the probability of any given
state occurring is dependent only on the next or previous state of the system,
without regard to how that state was encountered.
Natural language, especially that language which occurs during a dream state or
drugged rhapsody (frequently and too often with malicious intent, these are
misinterpreted as the ravings of madmen), can also be modeled with something
like a Markov chain because of the diffusive nature of tangential thought.

The Markov-chain chat robot applies the principle that the state of a finite
automaton can be described in terms of a set of states foregoing the present;
that is, the state of the machine is a sliding window, in which is recorded some
number of states that were encountered before the state existent at the moment.
Each such state is a word (or phrase / sentence / paragraph if you fancy a more
precise approach to artificial intelligence), and the words are strung together
one after another with respect to the few words that fit in the sliding window.
So, it's sort of like a compression algorithm in reverse, and similar to the way
we memorize concepts by relating them to other concepts. "It's a brain. Sorta."

One problem with Markov robots, and another reason why compilers are of import
in the scientific examination of artificial intelligence, is that of bananas.
The Banana Problem describes the fact that, when a Markov chain is traversed, it
"forgets" what state it occupied before the sliding window moved.
Therefore, for any window of width W < 6, the input B A N A N A first produces
state B, then states A and N sequentially forever.
Obviously, the Banana Problem can be solved by widening the window; however, if
you do that, the automaton's memory consumption increases proportionately.

Additionally, very long inputs tend to throw a Markov-'bot for a loop.
You can sorta fix this by increasing the width of the sliding window signifying
which state the automaton presently occupies, but then you run into problems
when the sliding window is too big and it can't think of any suitable phrase
because no known windows (phrases corresponding to the decision tree's depth)
fit the trailing portion of the input.
It's a sticky problem, which is why I mentioned compilers; they're of import to
artificial intelligence, which is news to absolutely no one, because compilers
(and grammar, generally) describe everything we know about the learning process
of everyone on Earth: namely, that intelligent beings construct semantic meaning
by observing their environments and deducing progressively more abstract ideas
via synthesis of observations with abstractions already deduced.
Nevertheless, you'd be hard-pressed to find even a simple random-walk chatbot
that isn't at least amusing.
(See the "dp" module in MLPTK, which implements the vanilla DisPress algorithm.)

My chatbot, pal9000, is inspired by the Dissociated Press & Eggdrop algorithms;
the copy rights of which are held by their authors, who aren't me.
Although p9k was crafted with regard only to the mathematics and not the code,
if my work is an infringement, I'd be happy to expunge it if you want.

Dissociated Press works like this:
	1. Print the first N words (letters? phonemes?) of a body of text.
	2. Then, search for a random occurrence of a word in the corpus
	   which follows the most recently printed N words, and print it.
	3. Ad potentially infinitum, where "last N words" are round-robin.
It is random: therefore, humorously disjointed.

And Eggdrop works like this (AFAICR):
	1. For a given coherence factor, N:
	2. Build a decision tree of depth N from a body of text.
	3. Then, for a given input text:
	4. Feed the input to the decision tree (mmm, roots), and then
	5. Print the least likely response to follow the last N words
	   by applying the Dissociated Press algorithm non-randomly.
	6. Terminate response after its length exceeds some threshold;
	   the exact computation of which I can't recall at the moment.
It is not random: therefore, eerily humanoid. (Cue theremin riff, thundercrash.)

A compiler, such as I imagined above, could probably employ sliding windows (of
width N) to isolate recurring phrases or sentences. Thereby it may automatically
learn how to construct meaningful language without human interaction.
Although I think you'll agree that the simplistic method is pretty effective on
its own; notwithstanding, I'll experiment with a learning design once I've done
QL's code generation method sufficiently that it can translate itself to Python.

Or possibly I'll nick one of the Python compiler compilers that already exists.
(Although that would take all the fun out of it.)

I'll parsimoniously describe how pal9000 blends the two:

First of all, it doesn't (not exactly), but it's close.
Pal9000 learns the exact words you input, then generates a response within some
extinction threshold, with a sliding window whose width is variable and bounded.
Its response is bounded by a maximum length (to solve the Banana Problem).
Because it must by some means know when a response ends "properly," it also
counts the newline character as a word.
These former are departures from Eggdrop.
It also learns from itself (to avoid saying something twice), as does Eggdrop.

In addition, p9k's response isn't necessarily random.
If you use the database I included, or choose the experimental "generator"
response method, p9k produces a response that is simply the most surprising word
it encountered subsequent to the preceding state chain.
This produces responses more often, and they are closer to something you said
before, but of course this is far less surprising and therefore less amusing.
The classical Eggdrop method takes a bit longer to generate any reply; but, when
it does, it drinks Dos Equis.
... Uh, I mean... when it does, the reply is more likely to be worth reading.
After I have experimented to my satisfaction, I'll switch the response method
back to the classic Eggdrop algorithm. Until then, if you'd prefer the Eggdrop
experience, you must delete the included database and regenerate it with the
default values and input a screenplay or something. I think Eggdrop's Web site
has the script for Alien, if you want to use that. Game over, man; game over!

In case you're curious, the algorithmic time complexity of PAL 9000 is somewhere
in the ballpark of O(((1 + MAX_COHERENCE - MIN_COHERENCE) * N) ^ X) per reply,
where N is every unique word ever learnt and X is the extinction threshold.
"It's _SLOW._" It asymptotically approaches O(1) in the best case.

For additional detail, consult /mlptk/reference/PAL9000/readme.txt.

Pal9000 is a prototypical design that implements some strange ideas about how,
exactly, a Markov-'bot should work. As such, some parts are nonfunctional (or,
indeed, malfunction actually) and vestigial. "Oops... I broke the algorithm."
While designing, I altered multiple good ideas that Eggdrop and DisPress did
right the first time, and actually made the whole thing worse on the whole. For
a more classical computer science dish, try downloading & compiling Eggdrop.

Om. Wherefore ore? Om. Om. Ore… art thou ore?

Dynamic recompilers: gold? Maybe. But what is the significance of recompilation
in the fast-paced, high-tech world of today? Does, indeed, my idea present any
useful information -- or am I merely setting forth in search of lost CPU time?
What am I doing? What am I saying? Where am I going? What am I computing? I've
built the swing, but what about the swang? After the swang, whither the swung?
(Paraphrased from sketches performed in "Monty Python's Flying Circus.")

This lecture provides, in abstract, my concept design of a specialized Turing
machine: the compound compiler/disassembler. It is feasible via my published
work, Quadrare Lexema, which is in the public domain as is my essay.

(Actually, I think these are more appropriately called "dynamic recompilers.")

If you're just joining me, Quadrare Lexema is a Bison-like parser generator
written in the Javascript programming language for use in Web browsers (FF3.6).
It is portable to any browser employing a modern, standards-compliant Javascript
interpreter; such as Chromium, Chrome, Opera, and (hypothetically) Konqueror.
I have not yet had an opportunity to acquire a copy of Microsoft's Web browser,
Internet Explorer, because I've had no opportunity to acquire a copy of Windows
(and couldn't pay for it, even if I did). QL produces a parser like GNU Bison;
which is to say, an LALR parser for Backus-Naur Format context-free grammars.
For more information, visit the Free Software Foundation's Web site; and/or the
Wikipedia articles treating Backus-Naur Format, regular grammars, LR parsers,
and Turing machines.

Because I have already described my work with the hundreds of pages of code and
the few pages of abstract condensed lectures, this is curtate. However, observe
that my foregoing proof constructs the basis upon which this concept rests.
Besides my work, bachelors of computer science worldwide have demonstrated in
their own dissertations similar proofs to that I have presented you by now.

Again: concept, not implementation. Toward that end lies the specification of a
Turing-complete virtual machine, wherein there be dragons. I will write you one
soon, but I am pending some subsequent blueprints on my acquisition of better
laboratory equipment. In the meantime, I'm pretty sure you can write one w/ QL.

Obviously: a compiler translates the syntax of a human-interface language into
machine instructions; parser generators can be employed to construct compilers;
and such generated parsers can be attached to other parsers in a daisy-chain.

So, make a compiler that converts the abstract instructions it reads into some
progressively simpler set of other instructions, by unshifting multiple tokens
back onto the input stream during the reduction step. You could either read them
again and thereby convert them to yet other computations, or simply skip-ahead
within the input stream until it has been entirely disassembled.

Consider: you're writing an artificial intelligence script, somewhat like this:
	FOR EACH actor IN village:
		WAKE actor,
(This syntax is very similar to the A.I. language specified for the Official
 Hamster Republic Role-Playing Game Creation Engine (O.H.R.RPG.C.E.).)
Your context-free grammar is walking along der strasse minding its own business;
when, suddenly, to your surprise, it encounters the ambiguous TIME OF DAY symbol
and the parser does the mash. It does the monster mash. It does the mash, and it
is indeed a graveyard smash in the sense that your carefully-crafted artificial
intelligence -- the very stuff and pith of whose being you have painstakingly
specified as a set of atomic operations in an excruciating mathematical grammar
-- has now encountered a fatal syntax error and you would like nothing better
than to percuss the computer until it does what you say because you've used each
!@#$ free minute learning how to program this thing for the past fifteen !@#$
years and God Forfend(TM) that it should deny you the fruit of your labor.

Well, actually, the OHRRPGCE does solve that problem for you, which is nice; and
maybe it even solves it in exactly this way, which is cool; but I'll continue...
Let's say your parser lets you unshift multiple symbols upon reduction, like QL
(or like any parser that does a similar thing, of which there are surely many);
then "TIME OF DAY" could reduce to an algorithm that does nothing more than find
out what time of day it is and then unshift a symbol corresponding to the script
that most closely matches what time of day it is.

In other words, you've specified the possible computations that the language can
execute as a set of atomic operations, then decomposed the abstract instructions
into those atomic operations before executing them. The functionality is similar
to that of the C preprocessor's #include directive, which directs the CPP to add
the source code from the #included header verbatim like a copy-paste operation.
That's the most straightforward application of this thought, anyway: copy-and-
pasting artificial intelligence scripts into other AI scripts.

Another thought: let's say your compiler is, in addition to its named capacity,
also a virtual machine that's keeping track of state in some artificial world.
So, it's two virtual machines in one. Rather silly, I know, but why not follow
me a bit deeper into the Turing tar-pit where everything is possible and nothing
is NP-complete? After all, "What if?" makes everything meaningless, like magic.

So, again, it's trucking along and hits some instruction like "SLAY THE DRAGON;"
but the dragon is Pete, who cleverly hides from the unworthy. Now, you could say
just ignore this instruction and carry on with the rest of the script, and maybe
even try to slay the dragon again later after you have slung Chalupas out the TB
drive-through window for a few hours. You could even say travel back in time and
slay the dragon while there's yet a chance -- because wholesale violation of the
law of causality is okay when you can't afford to lose your heroic street cred.
But why not check the part of the script you've yet to execute, to see if maybe
there's something else you need to do while you're here, and then try doing it
while you keep an eye out in case Pete shows up?
I mean, you could write that check as part of the "SLAY" atom; or as part of the
next action in your script that happens after the "SLAY" atom; but why not write
it as part of what happens before the "SLAY" atom even executes?

Also, what if you wanted the compiler to try two entirely different sequences of
symbols upon computation of a symbol reduction? Or one reduction modifies the
input stream somehow, while another does not? Or to reorganize/polymorph itself?

All this and more is certainly possible. QL does a big chunk of it the wrong way
by gluing-together all sorts of stuff that could have been done by making your
symbols each a parser instead. Oh, and iterators in Icon also do it better. And
let us not forget the honorable Haskell, whose compiler hails from Scotland and
is nae so much a man as a blancmange. "Learn you a Haskell for great good!" --
this is both the title of a book and an amusing pearl of wisdom.

The only thing that bothers me about this idea is that the procedure necessary
to polymorph the code at runtime seems to be (a) something that I ought to be
doing by nesting context-free parsers, rather than writing a contextual one; and
(b) probably requires a heuristic algorithm, which would be awfully slow.

I haven't developed this idea very well and there's some space left.
Here is a verbatim excerpt with an example Haskell algorithm, for no reason:

  From The Free On-line Dictionary of Computing (30 January 2010) [foldoc]:

     A sorting {algorithm} with O(n log n) average time
     One element, x of the list to be sorted is chosen and the
     other elements are split into those elements less than x and
     those greater than or equal to x.  These two lists are then
     sorted {recursive}ly using the same algorithm until there is
     only one element in each list, at which point the sublists are
     recursively recombined in order yielding the sorted list.
     This can be written in {Haskell}:
        qsort               :: Ord a => [a] -> [a]
        qsort []             = []
        qsort (x:xs)         = qsort [ u | u<-xs, u<x ] ++
                               [ x ] ++
                               qsort [ u | u<-xs, u>=x ]

     [Mark Jones, Gofer prelude.]

Now that I've burst my metaphorical payload, in regard to theory, I think future
lectures shall be of a mind to walk novices through some of the smaller bits and
pieces of the algorithms I've already written. There aren't many, but they're a
real pain to translate from source code to English (which is why I haven't yet).
I might also try to explain QL in further detail, but IDK if I can make it much
more straightforward than it is (no questions == no idea if you comprehend)...
and, besides, I've already written more than ten pages about it & it's tedious.

Depending on how much Pokemon I must play to clear my head after overhauling the
parser this year, and how much time is left after that to finish the shell's new
scripting language, I'll aim to write some of those simplified lectures by July.

On Loggin’.

The post title, "On Loggin'," is a pun on the algorithmic time-complexity of an
ordered binary tree seek operation: O(n * log(n)).

This lecture provides my advice in regard to sorting data by using binary trees.
(From Bouvier's Law Dictionary, Revised 6th Ed (1856) [bouvier]:
 ADVICE, com. law. A letter [...] intended to give notice of [facts contained].)
My lectures are in the Public Domain, and are free of charge.
"By order of the Author." - Mark Twain, _Huckleberry Finn._

A tree is a directed acyclic graph. ("Tree" (1998-11-12) via FOLDOC 30jan2010.)
Other sort methods besides tree sort include quicksort, bubble sort, heap sort,
and insertion sort. ("Sort" (1997-02-12) via FOLDOC, 30th January 2010.)

This pattern can be seen almost everywhere that data are represented as discrete
notional objects: for example, the directory trees contained within the file
systema of your personal computer, the technology/skill trees that are likely to
present themselves in your favorite video game, and in any computer program that
maintains the logical order of a data set as new data are added. These aren't
necessarily treelike data structures; but, because the perceived "shape" of the
data is a tree, they're probably similar if coded in object-oriented language.
I imagine a file system as "what an object even is;" although a computer program
is volatile, whereas files are nonvolatile, the idea is much the same.
See also: object-oriented programming, procedural vs. functional languages, file
systems, {non,}volatile memory such as RAM and EEPROM, sort algorithms.

Of course, treelike structures aren't always the optimum schematic to represent
a hierarchical assortment of data; definitively, like all other object-oriented
designs, they're mere abstractions employed to delineate pieces of a contiguous
unidimensional memory space (the Turing model of a logic machine considers RAM
to be just a big empty pair of set brackets; data are populated & operated-upon
arbitrarily, but the container is merely a set) and to signify correlation.
An object is a collection of related data; which comprise a several datum.
See also: pointers, heap memory, set theory, discrete math, and Boolean logic.

¶ The following (til the next pilcrow) is nearly a verbatim excerpt from FOLDOC.
The data stored in a tree is processed by traversal. The algorithm begins at
root node, transforms or collects or computes against the data, and then repeats
itself for the root's child-nodes ("branches") in some specific order. Three
common traversal orders are pre-order, post-order, and in-order traversal.
For the binary tree illustrated below:
       / \
      I   S
     / \
    D   E
A pre-order traversal visits the nodes in the order T I D E S.
A post-order traversal visits them in the order D E I S T.
An in-order traversal visits them in the order D I E T S.
¶ "Traversal" (2001-10-01) via FOLDOC, ed. 30th January 2010.
See also: recursion, the call stack (subroutine invocation), digraph traversal.

To encode a tree walk via recursive algorithm is straightforward, and is how the
concept of data set traversal is introduced to novitiate computer programmers.
(Well, after they've learned how to loop over an array with "for.")


    function put (pNode, value) {
        if (pNode.isEmpty()) { pNode.datum = value; }
        else if (value < pNode.datum) { put(pNode.bLeft, value); }
        else if (value > pNode.datum) { put(pNode.bRight, value); }
    } // Ordered insertion.

    function tell (pNode, value) {
        if (value < pNode.datum) { return tell(pNode.bLeft, value); }
        else if (value > pNode.datum) { return tell(pNode.bRight, value); }
        else if (pNode.isEmpty()) { return NULL; }
        return pNode;
    } // Seek pointer to inserted.

_Any_ recursive algorithm can be reduced to an iterative algorithm, provided you
can use variable-length arrays to simulate the function of the call stack, which
stores the arguments to the subroutine you invoked and the address to restore to
the instruction pointer when the subroutine returns. That is, the call stack is
like bookmarks or tabbed indices to tell the routine where it was before a jump.
Replace the "instruction pointer and arguments" model of a stack frame with any
constant-width data set that remembers what you were doing before you jumped to
the next node in the digraph, and, naturally, you can remember the same stuff as
if you had utilized the convenient paradigm that is recursivity. (Stack frames
don't really all have to be the same size, but a definite frame width spares you
from doing yet another algorithm to parse frames. See: network protocol stacks.)

The "stuff you remember" is the state of the finite state automaton who tends to
the mechanism whereby the machine knows which instruction to execute next.
Recursion provides this automaton "for free," because it crops up so frequently;
but, for whatever reason (such as the 3000-call recursion limit in Firefox 3.6),
you might want to write a tree sort without using recursion at all.

Gentlemen, behold...
    corm = GenProtoTok(
        function () {
            this.tree = new this.Hop();
            return this;

        "locus", undefined,

        "seek", function (u_val) {
            while (this.locus[2] != undefined) {
                if (u_val > this.locus[2]) { this.locus = this.hop(1); }
                else if (u_val == this.locus[2]) { return false; }
                else { this.locus = this.hop(0); } // u < loc[2] || u == undef
            return true;
        }, // end function Hopper::seek

        "tellSorted", function () {
            if (this.tree == undefined) { return undefined; }
            var evaluation = new Array();
            for (var orca = new Array(this.tree), did = new Array(false);
            ) {
                this.locus = orca.pop();
                if (did.pop()) {
                    if (this.locus[1] != undefined) {
                } else {
                    if (this.locus[0] != undefined) {
            return evaluation;
        }, // end function Hopper::tellSorted

        "hop", function (index) {
            if (this.locus[index]==undefined) {
                this.locus[index] = new this.Hop();
            return this.locus[index];
        }, // end function Hopper::hop

        "addUnique", function (value) {
            this.locus = this.tree;
            if (this.seek(value)) { this.locus[2] = value; return true; }
            return false;
        }, // end function Hopper::addUnique

        "Hop", GenPrototype(Array, new Array())
... corm!

Here is how corm works: it's an Array retrofitted with a binary tree algorithm.

Each node of the tree is an Array whose members, at these indices, signify:
    0. Left-hand ("less-than") branch pointer.
    1. Right-hand ("greater-than") branch pointer.
    2. Value stored in the node.

Values are added via the corm object's addUnique method, which resets a pointer
to the algorithm's location in the tree and then seeks a place to put the datum.
Seeking causes a node to be created, if no node yet contains the datum; so, the
corm::seek method's name is misleading, but seeking is exactly what it does.

In-order traversal is executed by corm::tellSorted which returns the sorted set.
Recursion is simulated using a state stack whose frames each contain a pointer
to the "previous" node and a boolean value that signifies whether its left child
has been visited. Here is the main loop, step-by-step, in natural English:
    for (var orca = new Array(this.tree), did = new Array(false);
    ) {
"Given the stack, which I consider until it is exhausted..."
        this.locus = orca.pop();
"I am here." -^
        if (did.pop()) {
"If I did the left branch already,"
"then this value occurs at this location in the ordered set;"
            if (this.locus[1] != undefined) {
"if the right branch exists,"
"visit the right child on the next loop iteration"
"(and, naturally, I haven't visited the right child's left child yet)"
        } else {
"If I haven't done the left branch already,"
"then I have to visit this node again when I come back"
"(except I shall have visited the left child by then);"
            if (this.locus[0] != undefined) {
"if the left branch exists,"
"visit the left child on the next loop iteration"
"(and, naturally, I haven't visited the left child's left child yet)"
"... I consume pointers and bools, doing all that there stuff"

And that is how to do everything I did with the Hopper data structure
(which is here renamed corm, because I like corm) in quadrare-lexema.js.

Full disclosure: whereas; although I do put on women's clothing; and although
indeed I sleep all night and work all day; and, in addition, despite that I have
been known, of an occasion: to eat eat my lunch, go to the lavatory, pick wild
flowers, hang around in bars, and want to be a girlie; my Wednesdays have not
historically been known to involve shopping or buttered scones, I don't know how
to press a flower, I have neither suspenders nor heels nor a bra (but thanks for
offering), and am not, in fact, a lumberjack.

Of Stack & State.

Next MLPTK development snapshot is available here, courtesy of Mediafire:
My manuscript describes some fascinating new ideas worth their weight in paper.
Parental discretion is advised, until I can excerpt the swearing and easter-egg.

This discourse follows the one I issued previously, which explained the theory
behind LALR parsers. Now I shall write of my library, Quadrare Lexema, which is
the work I've been engineering so that I can teach myself about that theory.

Beware that this algorithm is inefficient and a poor teaching example.
Also, because I have described it so briefly, this post is not yet a lecture.
As with the foregoing, it's public domain. So is my software.

A general-format parser (i.e.: a parser written to parse any grammar, as opposed
to parsers that are written by hand and whose grammar is fixed; or, in yet other
words, a soft-coded parser as opposed to a hard-coded one) must be capable of
reading a formal grammar. A formal grammar specifies how the input sequence is
to be agglutinated to produce semantic meaning via iterative computation of the
"phrases" encountered as the input is consumed. In an LR parser, this could be a
one-dimensional linked list or tree; or, if you're better at math than I, simply
a recursive function that generates the parser's static code as it reads rules.

I designed QL to generate a circularly-linked tree, in the hope I could rewrite
it to be more efficient later. The tree encodes, essentially, the conditional
expressions necessary for the parser to know which state it enters next when it
encounters a new input datum. In other words, "where do I go from here?" Later,
the tree is spidered by the parser, but that's another lecture. BTW, a spider is
a program that walks a path through {hyper,}linked data; i.e., spider = fractal.

QL reads a grammar that is very similar to Backus-Naur Format, or BNF, which is
the format read by GNU Bison. (For more information, consult the FSF or Wikipe.)
There are some non-deterministic extensions and other such frippery in my script
but the format of a grammar rule is mostly the same as BNF:
    - Symbol identifiers match a symbol w/ that ID at that position in sequence.
    - The "|", or OR operator, signifies either one symbol or another etc.
    - Parentheses are used to nest mutually exclusive ("OR"ed) subsequences.
    - Operator precedence and associativity is specified globally. (Non-BNF???)
The nonstandard extensions permit precedence and associativity to be specified
per-rule; arbitrary-length look{ahead,behind} assertions; & new specifiers for
operator precedence. Of these, only the new precedences are operational -- and
they both tend to break if not handled with nitroglycerine-quality caution.
There are a few other knobs and levers that change the behavior of the parser in
subtle ways, but these have naught to do with the state mapping phase.

Each grammar rule is specified as a duplet comprising a string (sequence of IDs)
and an object constructor (the reduction deferral object). The former element of
the duplet is significant to the state mapper; the latter, to the parser itself.

The rules are tokenized with a naive scanner; it changes non-operator, non-space
substrings to objects that represent nodes in the state map, passing the stream
of mixed tokens (nodes) & character literals (operators) to another method that
implements a "recursive" algorithm to attach each node to those succeeding it.
(It's a manual-transmission stack, but tantamount to recursion.) Succinctly:
    Beginning at locus zero, the root state whence all other states proceed,
    Either handle the operators:
    1. If this item of the token/operator stream is '(', increase stack depth.
    2. Else, if it's '|', the mediate (present) loci are stored @ terminal loci
       (that is, the "split ends" of the subsequence), & mediate loci cleared.
    3. Else if it's ')', decrease stack depth & the loci termini are catenated
       onto the mediate loci: i.e., the working loci now point at the nodes who
       signify each end of each branch in the foregoing parenthetical sequence.
    4. The identifier at this location in the rule string, which is a map node,
       is entered into the map at some index; this index is keyed (by the ID) in
       to the hash tables of all nodes corresponding to present loci.
    5. The new map entry becomes the new mediate locus. Or, possibly, an entry
       was entered separately for each present loci; in which case the new map
       entries become the new mediate loci.
    6. Proceed to next token in map path specification, or return mediate loci.
Note that the operators are less problematic when implemented w/ recursion; but,
because Javascript's maximum recursion in Firefox 3 is 3,000 calls, I didn't.

Here's an illustration of what the state map looks like:
                                | ROOT |
                               /   |   \
                     | NUMBER | | ABC | | XYZ |
                    /    |     \
               | + |   | - |  | * |  ... etc ...
              /     \    |      |
    | NUMBER | |STRING| |NUM| |NUM| ("NUM" is == NUMBER, but I ran out of space)
At each node, the sequence that's agglutinated by then can either reduce or not;
that, again, is the parser; I haven't space in this essay to explain it; however
I suffice to describe it thus.

I glossed-over the details to make the procedure more obvious. Again succinctly,
    - When the scanner produces map nodes, it also encodes some data in regard
      to lookahead/lookbehind predicates; and certain special-format identifiers
      such as "¿<anonymous Javascript function returning true or false>?" & ".",
      which I don't presently use and are either broken or unstable.
    - In transition between steps 4 and 5 above, if a node at some locus already
      has the map rule's identifier keyed in, the new path could be merged with
      the old one; in fact, it is, for now, but I'll change it later...
    - A backward link to the previous node, as well as the identifier of the
      eventual reduction of the sequence, is added in step 4.
    - Assorted other state data is added to the map retroactively: such as which
      tokens can block the parser and when; where an illegal reduction @ state 0
      can be made a node-skip elsewhere in the map; & so much other frippy stuff
      that I can't even remember and hope to revise out of the schematic.
      ("Frippy," cf. "Frippery," which is garish or cast-off finery.)
As you can see, the state mapper tries to do far too much.

Observe that the "manual-transmission stack," which I wrote by hand, encodes a
recursive algorithm that looks something like this in pseudocode:
    recursive function computeMapExpansion (Token_Stream) {
        declare variable Radices, to encode mutually exclusive subsets.
        // ("Radix" means "root;" here, the beginning of the tree subset.)

        declare variable Brooms, an array containing the "split ends" of the map
            subsequence that began at Radix.
        // (From "Witch's Broom," a crotch near the end of a tree branch.)

        for each token in Token_Stream {
            if the token is '(' {
                gather the parenthetical subsequence in some variable, eg Subseq

                duplet (SubRadices, SubBrooms) = computeMapExpansion(Subseq);

                attach each SubRadix as a next node after each current Broom.

                update Brooms (Brooms[current_brooms] = SubBrooms).
            } else, if the token is '|' {
                begin a new Radix and Brooms pair.
            } else {
                attach map node as next node following each current Broom,
                and update Brooms.
            }// end if-elif-else (parens operator? Mutex operator? Or map node?)
        } // end for (over token stream)

        return duplet containing Radices and Brooms;
    } // end function computeMapExpansion
Obviously that does the same thing, via the function invocation stack.

After the map is prepared, the parser uses it as a state configuration while it
reads input tokens. I plan to brief you on the parser in my next post.

Because some space is left, here are the relevant parts of quadrare-lexema.js:

    "addPath", function (atomized_rule, reduction_id) {
        // exactly the algorithm described above.

    "_whither", function (present_state, next_symbol) {
        // essentially, "can the map node at <state> proceed to <next_symbol>?"

    "reRefactor", function () {
        // call <refactor> for each illegal reduction removed from the root node

    "refactor", function (reduction_id) {
        // essentially exactly the same algorithm as addPath; walks the tree to
        // factor the specified identifier out of any rules in which it appears.

    "deriveTerminalsAndNon", function () {
        // separate which symbols are reductions, and which are from tokenizer.

    "tellReductionTracks", function (initial_symbol_id) {
        // "if <initial_id> proceeds from state 0, to what does it reduce?"

    "derivePotentialReductions", function (init_id) {
        // "what reductions could happen via compound reduction of sequence?"

    "deriveAllPotentialReductions", function () {
        // "do that one up there a bunch of times."

    "attachBlockingPredicates", function () {
        // "at any node in the map; if the parser must block here, and given the
        //  symbol that shall appear first in the sequence that begins when the
        //  parser blocks, can any potential reduction thence be shifted onto
        //  this sequence in progress which is blocking at this node?"

    "StateNode", GenProtoTok(
        // metric tons of crap

Overview of LR Parsers.

Hi. I'm Troy McClure.

... Oh, wait. He was a character on The Simpsons. Who was I, again?

Ah. I remember now. I'm Thor King: the author of MLPTK.
If you'd like to acquaint yourself with my work, you can find it @ the following Mediafire URL for as long as Mediafire will host me and my download cap holds:

(elder archive removed to make room for the new one. See top of next post at https://ipduck.wordpress.com/2016/02/16/of-stack-state/)

If you can't access it, then write to me at 1433 Flannigan Creek Road, Viola,
ID 83872, and I'll endeavor to send you a CD or hard copy. Please do not send
money; parcels sometimes "mysteriously" do not arrive at my mailbox. Or simply
wait for the next time I refresh the archive at Mediafire. That's likely to be
sooner than I could send a copy by postal mail, but I'll try if you want.

Unlike my prior development snapshots of my software, some parental discretion
is advised for this one because of the pop-up graphic when the user moves his
or her mouse pointer over the Hiragana "Fu" near the top of the page.
If your parents aren't discrete, then how in blazes are there two of them?

My lecture today shall advise you in regard to the theory & practice of writing
LALR (lookahead, left-to-right) parsers. This problem set is essentially
solved by the excellent Flex and Bison (or Lex and Yacc), which are part of the
Free Software Foundation's GNU toolset & available on POSIX-compliant operating
systems such as Linux, Unix, and MacOS 10 and greater; these are written in the
C programming language, but you can write your own easily in any language once
you understand the mathematical principles that underpin the technology.

Please be aware that I might not accurately understand some of the theory.
Seek other sources if you're really interested.

A parser is a computer program that executes instructions corresponding to its
input. That is, it is a reprogrammable program, or a program that writes other
programs. Parsers are ubiquitous throughout the field of computer science, and
necessary in the computation of any algorithm that must alter its behavior
significantly over a large input domain: for example, programming languages,
network protocols, and interprocess communication. A reliable parser, such as
the GNU utilities I mentioned above, is mathematically verified and saves a lot
of debugging, especially when an existing schematic must be altered.

Parsers are finite state automata: the fundamental theorem of computer science.
For more information, see "Turing machine" at Wikipedia. Also of interest are
CPU chip set architecture, circuitry, logical and memory gates, MOSFETs (metal
oxide semiconductor ferroelectric transistors), Boolean logic, assembly code &
instruction sets (especially Intel 80x86 and extensions), lexers / scanners,
GLR parsers, Regular Expressions, POSIX & the Single Unix Specification, and any
patents in regard to computer technology that you can dig up at a patent office.

The phrase "computer programming" implies that the programmer literally issues
instructions directly to the machine. Indeed, parsers can do this for you, too;
but compiler design is beyond the scope of my comprehension as yet. Instead, I
shall treat of how to write an interpreter generator. Yet again and again: the
necessary work _has_ already been done for you, via Flex and Bison. I'll write
as though you're designing a similar kind of parser, so that you know exactly
how much work you can save yourself by using the industry standard technology.

An interpreter executes high-level instructions as the input is being consumed,
unlike a compiler which assembles machine instructions to be executed later. A
machine instruction is a sequence of electrical impulses, usually represented as
binary code, that specifies the operation of a logic circuit such as a CPU. The
x86 instruction set; which operated the Intel x86, Pentium, and Pentium II; was
widely in use for decades. Windows included a tool named 'debug', for use within
the MS-DOS prompt, that permitted the owner of the operating system to write in
this language and execute the results without installing anything extra.
In 2016, x86 remains a common instruction set.

The reason to write an interpreter is basically: to build a data structure.
A Bison-like parser is not necessary to achieve this end, but can significantly
reduce the time necessary to specify the protocol or structure format. If you'd
imagine the task of programming to be similar to the art of illustration, then
your program iterates through several design phases: alpha (storyboard sketch),
beta (concept sketch), maintenance (outline inking), more maintenance (shading),
even more maintenance (coloring), and finished work (touch-ups etc), throughout
a duration spanning some years. Before you commit to any such tedious procedure,
you might like to gather some idea of how the finished work will look; that is
what sketching is all about, and similarly shading the inked illustration. The
right parser can get your "sketch" on "canvas" with much greater efficiency.

The standard parser technology is the LR parser; a finite state automaton that
describes how to read data from the beginning, combine tokens (pieces of data)
into sequences as they are encountered, combine the sequences to progressively
more complex sequences by computing the semantic value of each sequence and
representing that value as a token, then finally arrive at a solitary token that
signifies a complete program or data structure. The canonical example is Bison.
The full specification of the Bison Parser Algorithm (©) is available in texinfo
format from the Ubuntu repository, from the Free Software Foundation, and from
any of the many fine establishments where GNU is served daily.

LR parsers read tokens one at a time, either shifting them onto a parse stack or
reducing a sequence on the stack from multiple tokens to one. When a sequence is
reduced, a new semantic value is computed and stored in the token it reduced to.
	1. Read one token, which is the lookahead token; the former lookahead token
	   becomes the working token.
	2. If the lookahead token is an operator with higher precedence than the
	   last operator encountered, start a new sequence.
	3. If the working token proceeds from the foregoing sequence, shift it onto
	   the sequence in progress.
	4. Otherwise, if the sequence reduces, then pop the sequence and unshift the
	   reduced token back onto the input stream.
	5. Otherwise, block the sequence in progress and wait for a new sequence at
	   the working token to reduce to the token the blocked sequence expects.

Sort of. In order to know how the sequences progress, it's necessary to encode
a data structure that maps the steps between beginning and end. To do this, you
must write a parser that parses the rules that tell the LR parser what to do.
Technically Bison does this too, although the data structure that represents the
parsed rules is later discarded, because the "structure" is flattened-out into a
block of static code. You aren't writing your parser anew each time you make an
alteration to its grammar; the parser generator is writing the parser, given the
rules you specified in the grammar; so you have to write something that converts
the rules (which, for your sanity's sake, were written in something approaching
a user-friendly language) to a data structure that encodes the parser state at
each possible step through its input sequence.

This can be accomplished by writing a tree / treelike structure, where each node
contains forward links to the next parser state(s) and these links are keyed on
the identifier of the token that was read-in next from that state. Then the data
structure is flattened into static code; or, in my case, a flatter structure.
The parser's state map describes what happens when the next token is consumed;
mapping each possible configuration of the automaton's state data hasn't to do
with instructions executed while it operates; it is motive, not motion.

Then write a tokenizer, which converts the input data to a set of named tokens.
Really this can be made part of the parsing step; but it costs practically no
time to specify the tokenizer separately, which is modular and aids maintenance.
A tokenizer can be a scanner, like Flex, which recognizes patterns in data. Or
it could be any algorithm whose output conforms to the parser's input format. If
the data is already tokenized, or formed in such manner as the parser mustn't
need any assistance to understand it, tokenization is unnecessary; although you
might reformat the data slightly or alter some tokens before parsing them.

Tokenize the input and feed it to the parser which executes the above algorithm.
During the parse phase, additional computations are performed by the subroutines
you specified to compute the semantic values of token reductions. Afterward, you
have either executed your program or built a data structure to execute it later.

Now that you have sunk the foundation of the idea, let's cover it in concrete.

Using MLPTK as a debugging console, I have written a manuscript that I named
"Quadrare Lexema," which reads "to square the words." In fact, because the QL
library shall eventually form the backbone of a major overhaul of MLPTK's CLI,
and because the sacral pelvic articulation does conjoin the dorsal vertebrae,
my new experiment is the demonstrative proof that it is hip to be square.

If you would kindly accept my profuse contrition for spending an entire lecture
to set up that joke; I shall, throughout the next few posts, walk you through QL
and explain more-or-less exactly what it is doing. I plan to write them by June.