Of Stack & State.

Next MLPTK development snapshot is available here, courtesy of Mediafire:
http://www.mediafire.com/download/476ynd7ee7cla07/mlptk-qlds16feb2016.zip
My manuscript describes some fascinating new ideas worth their weight in paper.
Parental discretion is advised, until I can excerpt the swearing and easter-egg.

This discourse follows the one I issued previously, which explained the theory
behind LALR parsers. Now I shall write of my library, Quadrare Lexema, which is
the work I've been engineering so that I can teach myself about that theory.

Beware that this algorithm is inefficient and a poor teaching example.
Also, because I have described it so briefly, this post is not yet a lecture.
As with the foregoing, it's public domain. So is my software.

A general-format parser (i.e.: a parser written to parse any grammar, as opposed
to parsers that are written by hand and whose grammar is fixed; or, in yet other
words, a soft-coded parser as opposed to a hard-coded one) must be capable of
reading a formal grammar. A formal grammar specifies how the input sequence is
to be agglutinated to produce semantic meaning via iterative computation of the
"phrases" encountered as the input is consumed. In an LR parser, this could be a
one-dimensional linked list or tree; or, if you're better at math than I, simply
a recursive function that generates the parser's static code as it reads rules.

I designed QL to generate a circularly-linked tree, in the hope I could rewrite
it to be more efficient later. The tree encodes, essentially, the conditional
expressions necessary for the parser to know which state it enters next when it
encounters a new input datum. In other words, "where do I go from here?" Later,
the tree is spidered by the parser, but that's another lecture. BTW, a spider is
a program that walks a path through {hyper,}linked data; i.e., spider = fractal.

QL reads a grammar that is very similar to Backus-Naur Format, or BNF, which is
the format read by GNU Bison. (For more information, consult the FSF or Wikipe.)
There are some non-deterministic extensions and other such frippery in my script
but the format of a grammar rule is mostly the same as BNF:
    - Symbol identifiers match a symbol w/ that ID at that position in sequence.
    - The "|", or OR operator, signifies either one symbol or another etc.
    - Parentheses are used to nest mutually exclusive ("OR"ed) subsequences.
    - Operator precedence and associativity is specified globally. (Non-BNF???)
The nonstandard extensions permit precedence and associativity to be specified
per-rule; arbitrary-length look{ahead,behind} assertions; & new specifiers for
operator precedence. Of these, only the new precedences are operational -- and
they both tend to break if not handled with nitroglycerine-quality caution.
There are a few other knobs and levers that change the behavior of the parser in
subtle ways, but these have naught to do with the state mapping phase.

Each grammar rule is specified as a duplet comprising a string (sequence of IDs)
and an object constructor (the reduction deferral object). The former element of
the duplet is significant to the state mapper; the latter, to the parser itself.

The rules are tokenized with a naive scanner; it changes non-operator, non-space
substrings to objects that represent nodes in the state map, passing the stream
of mixed tokens (nodes) & character literals (operators) to another method that
implements a "recursive" algorithm to attach each node to those succeeding it.
(It's a manual-transmission stack, but tantamount to recursion.) Succinctly:
    Beginning at locus zero, the root state whence all other states proceed,
    Either handle the operators:
    1. If this item of the token/operator stream is '(', increase stack depth.
    2. Else, if it's '|', the mediate (present) loci are stored @ terminal loci
       (that is, the "split ends" of the subsequence), & mediate loci cleared.
    3. Else if it's ')', decrease stack depth & the loci termini are catenated
       onto the mediate loci: i.e., the working loci now point at the nodes who
       signify each end of each branch in the foregoing parenthetical sequence.
    Otherwise,
    4. The identifier at this location in the rule string, which is a map node,
       is entered into the map at some index; this index is keyed (by the ID) in
       to the hash tables of all nodes corresponding to present loci.
    5. The new map entry becomes the new mediate locus. Or, possibly, an entry
       was entered separately for each present loci; in which case the new map
       entries become the new mediate loci.
    6. Proceed to next token in map path specification, or return mediate loci.
Note that the operators are less problematic when implemented w/ recursion; but,
because Javascript's maximum recursion in Firefox 3 is 3,000 calls, I didn't.

Here's an illustration of what the state map looks like:
                                | ROOT |
                               /   |   \
                     | NUMBER | | ABC | | XYZ |
                    /    |     \
               | + |   | - |  | * |  ... etc ...
              /     \    |      |
    | NUMBER | |STRING| |NUM| |NUM| ("NUM" is == NUMBER, but I ran out of space)
At each node, the sequence that's agglutinated by then can either reduce or not;
that, again, is the parser; I haven't space in this essay to explain it; however
I suffice to describe it thus.

I glossed-over the details to make the procedure more obvious. Again succinctly,
    - When the scanner produces map nodes, it also encodes some data in regard
      to lookahead/lookbehind predicates; and certain special-format identifiers
      such as "¿<anonymous Javascript function returning true or false>?" & ".",
      which I don't presently use and are either broken or unstable.
    - In transition between steps 4 and 5 above, if a node at some locus already
      has the map rule's identifier keyed in, the new path could be merged with
      the old one; in fact, it is, for now, but I'll change it later...
    - A backward link to the previous node, as well as the identifier of the
      eventual reduction of the sequence, is added in step 4.
    - Assorted other state data is added to the map retroactively: such as which
      tokens can block the parser and when; where an illegal reduction @ state 0
      can be made a node-skip elsewhere in the map; & so much other frippy stuff
      that I can't even remember and hope to revise out of the schematic.
      ("Frippy," cf. "Frippery," which is garish or cast-off finery.)
As you can see, the state mapper tries to do far too much.

Observe that the "manual-transmission stack," which I wrote by hand, encodes a
recursive algorithm that looks something like this in pseudocode:
    recursive function computeMapExpansion (Token_Stream) {
        declare variable Radices, to encode mutually exclusive subsets.
        // ("Radix" means "root;" here, the beginning of the tree subset.)

        declare variable Brooms, an array containing the "split ends" of the map
            subsequence that began at Radix.
        // (From "Witch's Broom," a crotch near the end of a tree branch.)

        for each token in Token_Stream {
            if the token is '(' {
                gather the parenthetical subsequence in some variable, eg Subseq

                duplet (SubRadices, SubBrooms) = computeMapExpansion(Subseq);

                attach each SubRadix as a next node after each current Broom.

                update Brooms (Brooms[current_brooms] = SubBrooms).
            } else, if the token is '|' {
                begin a new Radix and Brooms pair.
            } else {
                attach map node as next node following each current Broom,
                and update Brooms.
            }// end if-elif-else (parens operator? Mutex operator? Or map node?)
        } // end for (over token stream)

        return duplet containing Radices and Brooms;
    } // end function computeMapExpansion
Obviously that does the same thing, via the function invocation stack.

After the map is prepared, the parser uses it as a state configuration while it
reads input tokens. I plan to brief you on the parser in my next post.

Because some space is left, here are the relevant parts of quadrare-lexema.js:

    "addPath", function (atomized_rule, reduction_id) {
        // exactly the algorithm described above.
    },

    "_whither", function (present_state, next_symbol) {
        // essentially, "can the map node at <state> proceed to <next_symbol>?"
    },

    "reRefactor", function () {
        // call <refactor> for each illegal reduction removed from the root node
    },

    "refactor", function (reduction_id) {
        // essentially exactly the same algorithm as addPath; walks the tree to
        // factor the specified identifier out of any rules in which it appears.
    },

    "deriveTerminalsAndNon", function () {
        // separate which symbols are reductions, and which are from tokenizer.
    },

    "tellReductionTracks", function (initial_symbol_id) {
        // "if <initial_id> proceeds from state 0, to what does it reduce?"
    },

    "derivePotentialReductions", function (init_id) {
        // "what reductions could happen via compound reduction of sequence?"
    },

    "deriveAllPotentialReductions", function () {
        // "do that one up there a bunch of times."
    },

    "attachBlockingPredicates", function () {
        // "at any node in the map; if the parser must block here, and given the
        //  symbol that shall appear first in the sequence that begins when the
        //  parser blocks, can any potential reduction thence be shifted onto
        //  this sequence in progress which is blocking at this node?"
    },

    "StateNode", GenProtoTok(
        // metric tons of crap
    ),
Advertisements

4 thoughts on “Of Stack & State.

  1. mghypf@udkdoles.net

    I was just looking at your Of Stack & State. | The King’s Leavings. site and see that your site has the potential to become very popular. I just want to tell you, In case you didn’t already know… There is a website service which already has more than 16 million users, and most of the users are interested in topics like yours. By getting your website on this network you have a chance to get your site more visitors than you can imagine. It is free to sign up and you can read more about it here: http://s.t0m-s.be/3A – Now, let me ask you… Do you need your site to be successful to maintain your way of life? Do you need targeted traffic who are interested in the services and products you offer? Are looking for exposure, to increase sales, and to quickly develop awareness for your website? If your answer is YES, you can achieve these things only if you get your website on the network I am talking about. This traffic service advertises you to thousands, while also giving you a chance to test the service before paying anything. All the popular sites are using this network to boost their traffic and ad revenue! Why aren’t you? And what is better than traffic? It’s recurring traffic! That’s how running a successful website works… Here’s to your success! Read more here: http://todochiapas.mx/C/3jq

    Like

    Reply
  2. MNatoshabap

    Buy avodart 0.5mg online
    [url=http://www.caricomdevelopmentfund.org/website/forum/welcome-mat/15424-buy-order-avodart-dutasteride-no-rx]Buy avodart 0.5mg online[/url]
    [url=http://www.caricomdevelopmentfund.org/website/forum/welcome-mat/15424-buy-order-avodart-dutasteride-no-rx]Online buy avodart dutasteride[/url]
    http://www.caricomdevelopmentfund.org/website/forum/welcome-mat/15424-buy-order-avodart-dutasteride-no-rx

    Like

    Reply
  3. invalidpducontent Post author

    I’m not sure who either of you are; but, just in case you’re victims of an automatic divide and conquer algorithm, did you know you’re selling avodart and banner ads?
    Well, it turns out that I need neither Avodart, whatever that is, or Web advertisements.
    I’m sorry it took so long to approve your posts, and congratulations on being the first people andor robots to discover my Web site and talk to me!
    I will fondly remember your kindness for the rest of my life. You were more kind to me than anyone I have ever met in real life.

    Like

    Reply
  4. Pingback: Overview of LR Parsers. | The King's Leavings.

Be careful what you say. The U.S.A. is not as free a country as you have been led to believe.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s