NAME
Chatbot::Eliza - A clone of the classic Eliza program
SYNOPSIS
use Chatbot::Eliza;
DESCRIPTION
This module implements the classic Eliza algorithm. The original Eliza program was written by Joseph Weizenbaum and described in the Communications of the ACM in 1967. Eliza is a mock Rogerian psychotherapist. It prompts for user input, and uses a simple transformation algorithm to change user input into a follow-up question. The program is desigend to give the appearance of understanding.
This program is a faithful implementation of the program described by Weizenbaum. It uses a simplified script language (devised by Charles Hayden). The content of the script is the same as Weizenbaum's.
This module encapsulates the Eliza algorithm in the form of an object. This should make the functionality easy to use in larger programs.
USAGE
This is all you need to do to launch a simple Eliza session:
use Chatbot::Eliza;
$mybot = new Chatbot::Eliza;
$mybot->command_interface;
You can also customize certain features of the session:
$myotherbot = new Chatbot::Eliza;
$myotherbot->name( "Hortense" );
$myotherbot->debug( 1 );
$myotherbot->command_interface;
These lines set the name of the bot to be "Hortense" and turn on the debugging output.
When creating an Eliza object, you can specify a name and an alternative scriptfile:
$bot = new Chatbot::Eliza "Brian", "myscript.txt";
If you don't specify a script file, then the Eliza module will initialize the new Eliza object with a default script that the module contains within itself.
You can use any of the internal functions in a calling program. The code below takes an arbitrary string and retrieves the reply from the Eliza object:
my $string = "I have too many problems.";
my $reply = $mybot->transform( $string );
You can easily create two bots, each with a different script, and see how they interact:
use Chatbot::Eliza
my ($harry, $sally, $he_says, $she_says);
$sally = new Chatbot::Eliza "Harry", "histext.txt";
$harry = new Chatbot::Eliza "Sally", "hertext.txt";
$he_says = "I am sad.";
while (1) {
$she_says = $sally->_transform( $he_says );
print $sally->name, $she_says, "\n";
$he_says = $harry->_transform( $she_says );
print $harry->name, $he_says, "\n";
}
Of course, as with the original Eliza program, the magic of the algorithm is really in the script.
MAIN DATA MEMBERS
Each Eliza object uses the following data structures to hold the script data in memory:
%decomplist
hash: the set of keywords; values: strings containing the decomposition rules.
%reasmblist
hash: a set of values which are each the join of a keyword and a corresponding decomposition rule; values: the set of possible reassembly statements for that keyword and decomposition rule.
%keyranks
hash: the set of keywords; values: the ranks for each keyword
@quit
"quit" words -- that is, words the user might use to try to exit the program.
@initial
Possible greetings for the beginning of the program.
@final
Possible farewells for the end of the program.
%pre
hash: words which are replaced before any transformations; values: the respective replacement words.
%post
hash: words which are replaced after the transformations and after the reply is constructed; values: the respective replacement words.
%synon
hash: words which are found in decomposition rules; values: words which are treated just like their corresponding synonyms during matching of decomposition rules.
@memory
An array of user-input strings which are remembered and may be used at random moments in a dialogue.
METHODS
my $chatterbot = new Chatbot::Eliza;
new creates a new Eliza object. This method also calls the internal _initialize method, which in turn calls the parse_script_data method, which initializes the script data.
my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';
The eliza object defaults to the name "Eliza", and it contains default script data within itself. However, using the syntax above, you can specify an alternative name and an alternative script file.
See the method parse_script_data. for a description of the format of the script file.
$chatterbot->command_interface;
command_interface opens an interactive session with the Eliza object, just like the original Eliza program.
If you want to design your own session format, then you can write your own while loop and your own functions for prompting for and reading user input, and use the transform method to generate Eliza's responses.
But if you're lazy and you want to skip all that, then just use command_interface. It's all done for you.
$string = preprocess($string);
preprocess applies simple substitution rules to the input string. Mostly this is to catch varieties in spelling, misspellings, contractions and the like.
preprocess is called from within the transform method. It is applied to user-input text, BEFORE any processing, and before a reassebly statement has been selected.
It uses the array %pre, which is created during the parse of the script.
$string = postprocess($string);
postprocess applies simple substitution rules to the reassembly rule. This is where all the "I"'s and "you"'s are exchanged. postprocess is called from within the transform function.
It uses the array %post, created during the parse of the script.
if ($self->_testquit($user_input) ) { ... }
_testquit detects words like "bye" and "quit" and returns true if it finds one of them as the first word in the sentence.
These words are listed in the script, under the keyword "quit".
$reply = $chatterbot->transform( $string );
transform applies transformation rules to the user input string. It invokes preprocess, does transformations, then invokes postprocess. It returns the tranformed output string, called $reasmb.
$self->parse_script_data;
parse_script_data is invoked from the _initialize method. It opens the scriptfile, if any, and reads in the script data.
FORMAT OF THE SCRIPT FILE
This module includes a default script file within itself, so it is not necessary to explicitly specify a script file when instantiating an Eliza object.
Each line in the script file can specify a key, a decomposition rule, or a reassembly rule.
key: remember 5 decomp: * i remember * reasmb: Do you often think of (2) ? reasmb: Does thinking of (2) bring anything else to mind ? decomp: * do you remember * reasmb: Did you think I would forget (2) ? reasmb: What about (2) ? reasmb: goto what pre: equivalent alike synon: belief feel think believe wish
The number after the key specifies the rank. If a user's input contains the keyword, then the "transform" function will try to match one of the decomposition rules for that keyword. If one matches, then it will select one of the reassembly rules at random. The number (2) here means "use whatever set of words matched the second asterisk in the decomposition rule."
If you specify a list of synonyms for a word, the you should use a @ when you use that word in a decomposition rule:
decomp: * i @belief i *
reasmb: Do you really think so ?
reasmb: But you are not sure you (3).
Otherwise, the script will never check to see if there are any synonyms for that keyword.
HOW THE SCRIPTFILE IS PARSED
Each line in the script file contains an "entrytype" (key, decomp, synon) and an "entry", separated by a colon. In turn, each "entry" can itself be composed of a "key" and a "value", separated by a space. The parse_script_data function parses each line out, and splits the "entry" and "entrytype" portion of each line into two variables, "$entry" and "$entrytype".
Next, it uses the string "$entrytype" to determine what sort of stuff to expect in the "$entry" variable, if anything, and parses it accordingly. In some cases, there is no second level of key-value pair, so the function does not even bother to isolate or create "$key" and "$value".
"$key" is always a single word. "$value" can be null, or one single word, or a string composed of several words, or an array of words.
Based on all these entries and keys and values, the function creates two giant hashes: %decomplist, which holds the decomposition rules for each keyword, and %reasmblist, which holds the reassembly phrases for each decomposition rule. It also creates %keyranks, which holds the ranks for each key.
Five other arrays are created: %pre, %post, %synon, @initial, and @final.
John Nolan (jnolan@n2k.com) November 1997