Ubuntu Linux: the Wal-Mart(TM) Frontier. These are the voyages of the Spacecar Grosvenor. Its continuing mission: to allocate new structs & new classes, unite all people within its nation, and leak where memory has never leaked before. Of the numerous Linux installations ("distributions"), I've used Ubuntu Linux (published by Canonical Inc.) most. It contains the Linux kernel, the GNU core utilities, and several other items of interest such as an automagically-configured graphical user interface. It is extraordinarily user-friendly, to the point of feeling constrictive. (The desktop environment has changed since version 11: users now cannot reconfigure the taskbar or workspaces. The repository wants to be a dime-store, too, and although a potentially lucrative storefront I miss the simplicity of Synaptic.) Its installation procedure is simple: download a Live CD image from Canonical's Web site, burn it to a CD-R or RW (these days, you might even need a DVD), and reboot your machine with the disk inserted. (Don't forget to tell the BIOS -- er whatchamacallit, the Extended Firmware Interface -- to boot from CD.) You'll be presented with an operable temporary login. Thence you can install the OS. Also available from this interface was an option to create a USB startup disk, but it has been removed in recent revisions of Ubuntu: previously, using VirtualBox or any similar virtual machine, the user could run the LiveCD & make a startup USB without even rebooting out of their present operating environment, which was useful on old machines whose optical drives had failed. You can still "Install" to the USB key, but it boots slowly & you can't install it from there to a box. The installation wizard is a no brainer: "install alongside Windows." Easy! And it usually doesn't cause your existing Windows system to go up in smoke, either. However, to install Ubuntu more than once per box, you must repartition manually (and may also need to change grub: see /boot/grub and /etc/grub.d). Gparted is included within the live disc images, but must be retrieved again after install. If you'd like to make intimate friends with the manual pages, and discover where primary partitions go when they die, you can install with less assistance. This lets you specify partitions in which to mount your home & system directories, in case you'd like to keep them segregated. (That's probably a great idea, but I never do.) You can also create and specify swap partitions: which are employed as virtual memory and, I suspect, for hibernation and hybrid suspension. About file systems: I typically use FAT32, NTFS, ext4, and ext2. (Total newbie.) FAT32 is elderly and fragile. It's used for boot/EFI partitions, 3DS & 3GPs. NTFS is Microsoft's modern FS. Withstands some crashes, but has no fsck module. ext2 & ext4 are Linux's. ext4 journals. ext2 permits file undeletion (PhotoRec). The extended 4 system is harder to kill than a cockroach on steroids, so I tend to prefer it anywhere near the heart of my archives. I use ext2 | NTFS for USBs. Be very careful not to destroy your existing data when repartitioning the drive. Any such operation carries some risk; back up anything important beforehand. One way to backup is to prepare an empty HDD (or any medium of equal / greater size) and dump the complete contents of the populated disk into the empty one: dd if=/dev/sda of=/dev/sdb status=progress (Where sda is the populated disk, and sdb the empty backup disk.) Similar can be accomplished by dd'ing one of your partitions (/dev/sda1) into a disk or a file, then dd'ing the image back onto a partition of equal size. Disk image flashing is a simple and popular backup method for local machines, sparing you the time to learn rsync (which is more useful in long term remote backups). Far from being an annoying elder sister, dd is the Linux troll's best friend. Beware the dreaded "write a new boot/system partition" prompt. It bricked me. The problem was because I had set the system to "Legacy Support" boot mode, but the original (now unrecognized) installation was in Extended Firmware Interface mode. I was unable to recover until I had re-flashed several partitions. The usual "new car smell" applies: you'll want to configure whatever settings haven't yet been forbidden to you by your GUI-toting overlords. In Ubuntu 16, access them by clicking the gear and wrench icon on the launcher panel. You can also search for something you're missing by using the Dash (Super, or Windows, key pulls it up: then type), which functions similarly to the apropos command: e.g., instead of Ctrl + Alt + T and then "man -k image", Super key then "image". It will also search your files (and, after plugins, several social media sites). Although the newfangled Dash is convenient, don't forget your terminal emulator: you can easily spend the vast majority of your working time using bash by way of gnome-terminal, without ever clicking your treasured Microsoft IntelliMouse 1.1. In Ubuntu 16, as it has been since Ubuntu 11, Ctrl + Alt + T opens the terminal. Under the directory /usr/share/man/, you will find the on line (interactive) manual. This describes the tools available to you. Begin reading it by opening a terminal window (using Control + Alt + T, or the Super / Windows key and then typing "terminal"), keying the command 'man name_of_manual_page', and pressing the Enter key. In this case, the name of the manual page is the page's archive's filename before the .[0-9].gz extension. Of particular interest: telinit, dd, printf, cat, less, sed, tee, gvfs-trash, mawk, grep, bash (if you're using the Bourne Again Shell, which is default on Ubuntu 16), cp, rm, mv, make, sudo, chroot, chown, chmod, chgrp, touch, gunzip, gzip, zip, unzip, python, g++, apt-get (especially `apt-get source ...`), mount, kpartx, date, diff, charmap (same name on Windows!), basename, zipinfo, md5sum, pdftotext, gnome-terminal (which is _how_ you're using bash), fortune, ffmpeg, aview, dblatex, find, cut, uniq, wc, caesar, rot13, curl, wget, sort, vim, man, tr, du, nautilus, tac, column, head, tail, stat, ls, pwd, pushd, popd, gedit, source-highlight, libreoffice (a Microsoft Office killer), base64, flex, bison, regex, perl, firefox, opera, chromium-browser, konqueror, lynx, virtualbox, apropos, od, hexdump, bless, more, pg, pr, echo, rmdir, mkdir, fsck, fdisk (same name, but different function, in Windows), ln, gdm, gnome-session, dhelp, baobab, gparted, kill, locate, ps, photorec, testdisk, update-grub... (If you haven't some of the above, don't worry. You should already have all you need. Keep in mind that the Ubuntu repository's software is divided in sections some of which contain potentially harmful or non-free software. When venturing beyond the fortified walls of <main>, be cautious: you may be eaten by a grue.) Beneath /usr/share/doc/ or /usr/share/help/ are sometimes additional manuals. If you use Linux, you will have to memorize several manuals, and name many more; especially those of the GNU core utilities, which are a great aid to computing. There's also a software repository to assist you with various computing tasks: To acquire additional software: gnome-software (the orange shopping bag to your left, above the Amazon.com icon), the friendly storefront, will assist you. If you prefer a compact heads-up-display, try the Synaptic Package Manager instead. `apt-get install package-name` works well if you know what you're looking for, as does apt-get source package-name for the ponderously masculine. And, speaking of ponderous masculinity, if you retrieve source code for any of Ubuntu's mainline packages, typically all you need to do is 'cd' into the folder containing the top level of the source tree and then invoke the following: 1. ./configure.sh (You shouldn't need to chmod u+x ./configure.sh to accomplish this.) 2. make (You may need to install additional packages or correct minor errors.) 3. sudo make install This can be abbreviated: ./configure.sh && make && sudo make install Beware that sudo is a potentially dangerous operation. Avoid it if unsure. The && operator, in bash, will only execute the next command if the past command exited with a successful status code (i.e., zero). But I digress. You'll occasionally want to mount your other partitions on Linux's file system, so that you can browse the files you've stored there. With external drives this is as simple as connecting them (watch the output of `tail -f /var/log/*` in a console window to observe the log messages about the procedure), but partitions on fixed disks (or others, 'cause reasons) may not be mounted automagically. So: mount -t fs_type -o option,option,... /dev/sd?? path/to/mount/point/ where the mount point is a directory somewhere in your file system. BTW, mounts that occurred automatically will be on points beneath /media/your_username/. On a dual boot Windows system, I mount -t ntfs -o ro /dev/sda3 ~/Desktop/wintmp often because the NTFS partition is in an unsafe state and won't mount writable. In that case, rebooting to Windows and running chkdsk /f C: from Command Prompt with Administrative privileges will sometimes clear the dirty flag if performed multiple times. (How many times before ntfs-3g mounts writable, seems to vary.) When you've attached external media, via USB etc, safely remove them after use: use the "Safely Remove" menu option in the right-click context menu in Nautilus' sidebar (be careful not to accidentally format the disk). You can also, from a shell (gnome-terminal), `sync && umount /dev/sdb*` (if sdb is the medium). Now that you've got a firm foothold in Ubuntu territory, I hope you can see your house from here 'cause Windows seems to be dying a miserable death of attrition. Don't count it out, though: all the Linuxes are terrible at Flight Simulator.
Since my last edition in March, I've been refining the design of Videlicet: a script I hope will aid in maintenance of digital art collections. Although I haven't yet finished work, it is near completion, as you can see in this abridged snapshot: http://www.mediafire.com/file/l0mh2dac75t63wl/TK-GreatestHits-2017-06-15.zip Presently, the only available functionality is a label-maker. Soon, it will be a label maker with tentacles. (Full disclosure: tentacles are metaphorical in nature, and the program is so described for the sole purpose of setting up this joke about octo-pythons.) Hopefully the finished work will be available to you by this July, at which time I will issue the complete edition. In the meantime, here are some computer-related trivia. Factual computer tidbits, each in 80 columns -- the canonical console width. (These are with considerable reference to FOLDOC and the Jargon File.) We say "computer" about machines these days, but it once meant one who computes. Computer machines are a kind of difference engine. They calculate math rapidly. The first such difference engine is attributed to Charles Babbage. The first reprogrammable machine, however, is reputed to be the Jacquard loom. Source code is a series of instructions telling a computer what to compute. Source code is interpreted by a compiler that translates it to assembler code. An electronic calculator is, in abstract, an infix notation algebra compiler. Assemblers translate human-legible mnemonics into computers' "machine language." Machine language, an abstraction of electrical potential (EMF), is binary code. The Volt, defined by the IEC in 1983, is the unit of electro-motive force (EMF). Metal oxide semiconductor field-effect transistors (MOSFETs) are logic circuits. Logic circuits encode inverse Boolean algebra: NAND, NOR, and NOT. Computer programming languages evolved to assembler mnemonics from machine code. From assembler, programs further evolved to high-level language (such as C). Very high level language (whatever that means - perhaps interpreters?) was next. Modern computer programming reads much like calculus. (C.f. ASM = arithmetic.) Object-oriented programming arranges data in nested structures, then computes. Functional programming computes with nested functions, then arranges the data. File systems are tree-like data structures encoding allocation of disk memory. Compressed archives use something like LZMA to squeeze redundant bytes in files. Non-volatile memory (disk space) lasts longer than volatile (RAM). Harddisk capacity is measured in Gigabytes. They're magnetic platters or EAPROM. Magnetic storage functions by "reading" and "writing" magnetic fields. Electrically alterable programmable read only memory blows fuses & antifuses. Operating systems handle tasks, as process scheduling, incidental to human use. Mainframes ("clouds") have one central OS; the clients are dumb terminals. Dumb terminals have no independent processing or storage capability. Personal computers, by contrast, have processing, storage, and an OS. Graphical user interfaces (GUIs) are the modern point-and-click metaphor. Command line interfaces (CLIs) are the "antiquated" terminal console metaphor. Computer networks are any set of computermachines "speaking" to one another. Sessions are by way of protocol. The Worldwide Web uses HTTP over TCP/IP. The Internet & WWW evolved via cookoffs: see the Requests for Comment. Bandwidth on the Internet has increased from baud to megabytes per second. All computing resources can be served on a network, bandwidth permitting. ^- Senator Ted Stevens' famous "series of tubes" quote was accurate, BTW. Human interface devices are a material tool humans use to interact w/ computers. Graphical computer displays evolved from oscilloscopes etc, to CRTs, to LCDs. Cathode ray tubes work by shooting electrons at a phosphorescent matrix. Modern television-size liquid crystal displays contain millions of circuits. CPUs handle arithmetic and logic. These days, there are 4 of them on one chip. Frequency of a CPU's oscillation is measured in Gigahertz. For example, 2.5 GHz. The chip's clock speed (oscillation) determines how fast it computes. All arithmetic can be computed by adding: by 1 only, too, I think. Ones Complement and Twos Complement are binary encodings for negative numbers. Microchips are printed circuit boards (PCBs) that execute various functions. The chips on a mainboard are connected to one another by a bus ("omnibus bar"). Computer hardware is the aforementioned assemblage of circuit boards. "Firmware" (between hardware & software) is on-board (on-chip?) control logic. Computer software is some instructions compiled & ready to run: a core image. "Bootstrapping" a computer refers to a story about a man who flew in the air. The Basic I/O System of yore was supplanted by the Extended Firmware Interface. Personal computer workstations are sometimes called "boxes," due to their shape. Alan Turing's model of a finite state automaton is a supposed computer. Emulators, or virtual machines, are also logical computers.
(I have implemented the Trivial File Transfer Protocol, revision 2, in this milestone snapshot. If you have dealt with reprogramming your home router, you may have encountered TFTP. Although other clients presently exist on Linux and elsewhere, I have implemented the protocol with a pair of Python scripts. You’ll need a Python interpreter, and possibly Administrator privileges (if the server requires them to open port 69), to run them. They can transfer files of size up to 32 Megabytes between any two computers communicating via UDP/IP. Warning: you may need to pull out your metaphorical monkey wrench and tweak the network timeout, or other parameters, in both the client and server before they work to your specification. You can also use TFTP to copy files on your local machine, if for whatever reason you need some replacement for the cp command. Links, courtesy of MediaFire, follow:
Executable source code (the programs themselves, ready to run on your computer): http://www.mediafire.com/file/rh5fmfq8xcmb54r/mlptk-2017-01-07.zip
Candy-colored source code (the pretty colors help me read, maybe they’ll help you too?): http://www.mediafire.com/file/llfacv6t61z67iz/mlptk-src-hilite-2017-01-07.zip
My life in a book (this is what YOUR book can look like, if you learn to use my automatic typesetter and tweak it to make it your own!): http://www.mediafire.com/file/ju972na22uljbtw/mlptk-book-2017-01-07.zip
Title is a tediously long pun on "Pan-Seared Programming" from the last lecture. Key: mechanism to operate an electric circuit, as in a keyboard. Emporium: ein handelsplatz; or, perhaps, the brain. Empyreuma: the smell/taste of organic matter burnt in a close vessel (as, pans). Lignite: intermediate between peat & bituminous coal. Empyreumatic odor. Pignite: Pokémon from Black/White. Related to Emboar & Tepig (ember & tepid). Pygmalion (Greek myth): a king; sculptor of Galatea, who Aphrodite animated. A few more ideas that pop up often in the study of computer programming: which, by the way, is not computer science. (Science isn't as much artifice as record- keeping, and the records themselves are the artifact.) MODULARITY As Eric Steven Raymond of Thyrsus Enterprises writes in "The Art of Unix Programming," "keep it simple, stupid." If you can take your programs apart, and then put them back together like Lego(TM) blocks, you can craft reusable parts. CLASSES A kind of object with methods (functions) attached. These are an idiom that lets you lump together all your program's logic with all of its data: then you can take the class out of the program it's in, to put it in another one. _However,_ I have been writing occasionally for nearly twenty years (since I was thirteen) and here's my advice: don't bother with classes unless you're preparing somewhat for a team effort (in which case you're a "class" actor: the other programmers are working on other classes, or methods you aren't), think your code would gain from the encapsulation (perhaps you find it easier to read?), or figure there's a burning need for a standardized interface to whatever you've written (unlikely because you've probably written something to suit one of your immediate needs: standards rarely evolve on their own from individual effort; they're written to the specifications of consortia because one alone doesn't see what others need). Just write your code however works, and save the labels and diagrams for some time when you have time to doodle pictures in the margins of your notebook, or when you _absolutely cannot_ comprehend the whole at once. UNIONS This is a kind of data structure in C. I bet you're thinking "oh, those fuddy- duddy old C dinosaurs, they don't know what progress is really about!" Ah, but you'll see this ancient relic time and again. Even if your language doesn't let you handle the bytes themselves, you've got some sort of interface to them, and even if you don't need to convert between an integer and four ASCII characters with zero processing time, you'll still need to convert various data of course. Classes then arise which simulate the behavior of unions, storing the same datum in multiple different formats or converting back and forth between them. (Cue the scene from _Jurassic Park,_ the film based on Michael Crichton's book, where the velociraptor peeks its head through the curtains at a half-scaffolded tourist resort. Those damn dinosaurs just don't know when to quit!) ACTUALLY, VOID POINTERS WERE WHAT I WAS THINKING OF HERE The most amusing use of void*s I've imagined is to implement the type definition for parser tokens in a LALR parser. Suppose the parser is from a BNF grammar: then the productions are functions receiving tokens as arguments and returning a token. Of course nothing's stopping you from knowing their return types already, but what if you want to (slow the algorithm down) add a layer of indirection to wrap the subroutines, perhaps by routing everything via a vector table, and now for whatever reason you actually _can't_ know the return types ahead of time? Then of course you cast the return value of the function as whatever type fits. ATOMICITY, OPERATOR OVERLOADING, TYPEDEF, AND WRAPPERS Washing brights vs darks, convenience, convenience, & convenience, respectively. Don't forget: convenience helps you later, _when_ you review your code. LINKED LISTS These are a treelike structure, or should I say a grasslike structure. I covered binary trees at some length in my fourth post, titled "On Loggin'." RECURSION The reason why you need recursion is to execute depth-first searches, basically. You want to get partway through the breadth of whatever you're doing at this level of recursion, then set that stuff aside until you've dealt with something immensely more important that you encountered partway through the breadth. Don't confuse this with realtime operating systems (different than realtime priority) or with interrupt handling, because depth-first searching is far different than those other three topics (which each deserve lectures I don't plan to write). REALTIME OPERATING SYSTEMS, REALTIME PRIORITY, INTERRUPT HANDLING Jet airplanes, video games versus file indexing, & how not to save your sanity. GENERATORS A paradigm appearing in such pleasant languages as Python and Icon. Generators are functions that yield, instead of return: they act "pause-able," and that is plausible because sometimes you really don't want to copy-and-paste a block of code to compute intermediate values without losing execution context. Generators are the breadth-first search to recursion's depth-first search, but of course search algorithms aren't all these idioms are good for. Suppose you wanted to iterate an N-ary counter over its permutations. (This is similar to how you configure anagrams of a word, although those are combinations -- for which, see itertools.combinations in the Python documentation, or any of the texts on discrete mathematics that deal with combinatorics.) Now, an N-ary counter looks a lot like this, but you probably don't want a bunch of these... var items = new Array(A, B, C, D, ...); // ... tedious ... var L = items.length; // ... lines ... var nary = new Array(L); // ... of code ... for (var i = 0; i < L; nary[i++] = 0) ; // ... cluttering ... for (var i = L - 1; i >= 0 && ++nary[i] == L; // ... all ... nary[i--] = ((i < 0) ? undefined : 0) // ... your other ... ) ; // end for (incrementation) // ... computations ... ... in the middle of some other code that's doing somewhat tangentially related. So, you write a generator: it takes the N-ary counter by reference, then runs an incrementation loop to update it as desired. The counter is incremented, where- upon control returns to whatever you were doing in the first place. Voila! (This might not seem important, but it is when your screen size is 80 by 24.) NOODLES AND DOODLES, POMS ON YOUR POODLES, OODLES AND OODLES OF KITS & CABOODLES (Boodle (v.t.): swindle, con, deceive. Boodle (n.): gimmick, device, strategy.) Because this lecture consumed only about a half of the available ten thousand characters permissible in a WordPress article, here's a PowerPoint-like summary that I was doodling in the margins because I couldn't concentrate on real work. Modularity: perhaps w/ especial ref to The Art of Unix Programming. "K.I.S.S." Why modularity is important: take programs apart, put them together like legos. Data structures: unions, classes. Why structures are important: atomicity, op overloading, typedefs, wrappers. linked lists: single, double, circular. Trees. Binary trees covered in wp04?? recursion: tree traversal, data aggregation, regular expressions -- "bookmarks" Generators. Perhaps illustrate by reference to an N-ary counter? AFTER-CLASS DISCUSSION WITH ONE HELL OF A GROUCHY ETHICS PROFESSOR Suppose someone is in a coma and their standing directive requests you to play some music for them at a certain time of day. How can you be sure the music is not what is keeping them in a coma, or that they even like it at all? Having experienced death firsthand, when I cut myself & bled with comical inefficiency, I can tell you that only the dying was worth it. The pain was not, and I assure you that my entire sensorium was painful for a while there -- even though I had only a few small lacerations. Death was less unpleasant with less sensory input. I even got sick of the lightbulb -- imagine that! I dragged myself out of the lukewarm bathtub to switch the thing off, and then realized that I was probably not going to die of exsanguination any time soon and went for a snack instead. AFTER-CLASS DISCUSSION WITH ONE HELL OF A GROUCH "You need help! You are insane!" My 1,000 pages of analytical logic versus your plaintive bleat.
Pall (n.): pawl. I couldn't write last week, and my upgrade to QL has progressed no further. For reference, I stalled before comparing the efficiency of nested Objects to that of nested Arrays, which I must test before experimenting further with the prototype compiler or even refining the design. I intend to do that this month. In the meantime, here's a snapshot of MLPTK with new experiments included. http://www.mediafire.com/download/566ln3t1bc5jujp/mlptk-p9k-08apr2016.zip And a correction to my brief about the grammar ("Saddlebread"): actually, the InchoateConjugation sequence does not cause differentiation, because the OP_CAT prevents the original from reducing. Other parts may be inaccurate. I'll revise the grammar brief and post a new one as soon as I have fixed the QL speed bug. I took some time out from writing Quadrare Lexema to write some code I've been meaning to write for a very long time: pal9000, the dissociated companion. This software design is remarkably similar to the venerable "Eggdrop," whose C source code is available for download at various locations on the Internets. Obviously, my code is free and within the Public Domain (as open as open source can be); you can find pal9000 bundled with today's edition of MLPTK, beneath the /reference/ directory. The chatbot is a hardy perennial computer program. People sometimes say chatbots are artificial intelligence; although they aren't, exactly, or at least this one isn't, because it doesn't know where it is or what it's doing (actually it makes some assumptions about itself that are perfectly wrong) and it doesn't apply the compiler-like technique of categorical learning because I half-baked the project. Soon, though, I hope... Nevertheless, mathematics allows us to simulate natural language. Even a simplistic algorithm like Dissociated Press (see "Internet Jargon File," maintained somewhere on the World Wide Web, possibly at Thyrsus Enterprises by Eric Steven Raymond) can produce humanoid phrases that are like real writing. Where DisPress fails, naturally, is paragraphs and coherence: as you'll see when you've researched, it loses track of what it was saying after a few words. Of course, that can be alleviated with any number of clever tricks; such as: 1. Use a compiler. 2. Use a compiler. 3. Use a compiler. I haven't done that with p9k, yet, but you can if you want. Of meaningful significance to chat robots is the Markov chain. That is a mathematical model, used to describe some physical processes (such as diffusion), describing a state machine in which the probability of any given state occurring is dependent only on the next or previous state of the system, without regard to how that state was encountered. Natural language, especially that language which occurs during a dream state or drugged rhapsody (frequently and too often with malicious intent, these are misinterpreted as the ravings of madmen), can also be modeled with something like a Markov chain because of the diffusive nature of tangential thought. The Markov-chain chat robot applies the principle that the state of a finite automaton can be described in terms of a set of states foregoing the present; that is, the state of the machine is a sliding window, in which is recorded some number of states that were encountered before the state existent at the moment. Each such state is a word (or phrase / sentence / paragraph if you fancy a more precise approach to artificial intelligence), and the words are strung together one after another with respect to the few words that fit in the sliding window. So, it's sort of like a compression algorithm in reverse, and similar to the way we memorize concepts by relating them to other concepts. "It's a brain. Sorta." One problem with Markov robots, and another reason why compilers are of import in the scientific examination of artificial intelligence, is that of bananas. The Banana Problem describes the fact that, when a Markov chain is traversed, it "forgets" what state it occupied before the sliding window moved. Therefore, for any window of width W < 6, the input B A N A N A first produces state B, then states A and N sequentially forever. Obviously, the Banana Problem can be solved by widening the window; however, if you do that, the automaton's memory consumption increases proportionately. Additionally, very long inputs tend to throw a Markov-'bot for a loop. You can sorta fix this by increasing the width of the sliding window signifying which state the automaton presently occupies, but then you run into problems when the sliding window is too big and it can't think of any suitable phrase because no known windows (phrases corresponding to the decision tree's depth) fit the trailing portion of the input. It's a sticky problem, which is why I mentioned compilers; they're of import to artificial intelligence, which is news to absolutely no one, because compilers (and grammar, generally) describe everything we know about the learning process of everyone on Earth: namely, that intelligent beings construct semantic meaning by observing their environments and deducing progressively more abstract ideas via synthesis of observations with abstractions already deduced. Nevertheless, you'd be hard-pressed to find even a simple random-walk chatbot that isn't at least amusing. (See the "dp" module in MLPTK, which implements the vanilla DisPress algorithm.) My chatbot, pal9000, is inspired by the Dissociated Press & Eggdrop algorithms; the copy rights of which are held by their authors, who aren't me. Although p9k was crafted with regard only to the mathematics and not the code, if my work is an infringement, I'd be happy to expunge it if you want. Dissociated Press works like this: 1. Print the first N words (letters? phonemes?) of a body of text. 2. Then, search for a random occurrence of a word in the corpus which follows the most recently printed N words, and print it. 3. Ad potentially infinitum, where "last N words" are round-robin. It is random: therefore, humorously disjointed. And Eggdrop works like this (AFAICR): 1. For a given coherence factor, N: 2. Build a decision tree of depth N from a body of text. 3. Then, for a given input text: 4. Feed the input to the decision tree (mmm, roots), and then 5. Print the least likely response to follow the last N words by applying the Dissociated Press algorithm non-randomly. 6. Terminate response after its length exceeds some threshold; the exact computation of which I can't recall at the moment. It is not random: therefore, eerily humanoid. (Cue theremin riff, thundercrash.) A compiler, such as I imagined above, could probably employ sliding windows (of width N) to isolate recurring phrases or sentences. Thereby it may automatically learn how to construct meaningful language without human interaction. Although I think you'll agree that the simplistic method is pretty effective on its own; notwithstanding, I'll experiment with a learning design once I've done QL's code generation method sufficiently that it can translate itself to Python. Or possibly I'll nick one of the Python compiler compilers that already exists. (Although that would take all the fun out of it.) I'll parsimoniously describe how pal9000 blends the two: First of all, it doesn't (not exactly), but it's close. Pal9000 learns the exact words you input, then generates a response within some extinction threshold, with a sliding window whose width is variable and bounded. Its response is bounded by a maximum length (to solve the Banana Problem). Because it must by some means know when a response ends "properly," it also counts the newline character as a word. These former are departures from Eggdrop. It also learns from itself (to avoid saying something twice), as does Eggdrop. In addition, p9k's response isn't necessarily random. If you use the database I included, or choose the experimental "generator" response method, p9k produces a response that is simply the most surprising word it encountered subsequent to the preceding state chain. This produces responses more often, and they are closer to something you said before, but of course this is far less surprising and therefore less amusing. The classical Eggdrop method takes a bit longer to generate any reply; but, when it does, it drinks Dos Equis. ... Uh, I mean... when it does, the reply is more likely to be worth reading. After I have experimented to my satisfaction, I'll switch the response method back to the classic Eggdrop algorithm. Until then, if you'd prefer the Eggdrop experience, you must delete the included database and regenerate it with the default values and input a screenplay or something. I think Eggdrop's Web site has the script for Alien, if you want to use that. Game over, man; game over! In case you're curious, the algorithmic time complexity of PAL 9000 is somewhere in the ballpark of O(((1 + MAX_COHERENCE - MIN_COHERENCE) * N) ^ X) per reply, where N is every unique word ever learnt and X is the extinction threshold. "It's _SLOW._" It asymptotically approaches O(1) in the best case. For additional detail, consult /mlptk/reference/PAL9000/readme.txt. Pal9000 is a prototypical design that implements some strange ideas about how, exactly, a Markov-'bot should work. As such, some parts are nonfunctional (or, indeed, malfunction actually) and vestigial. "Oops... I broke the algorithm." While designing, I altered multiple good ideas that Eggdrop and DisPress did right the first time, and actually made the whole thing worse on the whole. For a more classical computer science dish, try downloading & compiling Eggdrop.