NAME
jsFind - generate index for full text search engine in JavaScript
SYNOPSIS
use jsFind;
my $t = new jsFind(B => 4);
my $f = 1;
foreach my $k (qw{minima ut dolorem sapiente voluptatem}) {
$t->B_search(Key => $k,
Data => {
"path" => {
t => "word $k",
f => $f },
},
Insert => 1,
Append => 1,
);
}
DESCRIPTION
This module can be used to create index files for jsFind, powerful tool for adding a search engine to a CDROM archive or catalog without requiring the user to install anything.
Main difference between this module and scripts delivered with jsFind are:
You don't need to use swish-e to create index
you can programatically (and incrementaly) create index for jsFind
you can create more than one index and search them using same
search.html
page
You can also examine examples which come as tests with this module, for example t/04words.t
or t/10homer.t
.
jsFind
jsFind search engine was written by Shawn Garbett from eLucid Software. The search engine itself is a small piece of JavaScript (1.2 with level 2 DOM). It is easily customizable to fit into a current set of HTML. This JavaScript searches an XML index dataset for the appropriate links, and can filter and sort the results.
JavaScript code distributed with this module is based on version 0.0.3 which was current when this module development started. Various changes where done on JavaScript code to fix bugs, add features and remove warnings. For complete list see Changes
file which comes with distribution.
This module has been tested using html/test.html
with following browsers:
- Mozilla FireFox 0.8 to 1.0
-
using DOM 2
document.implementation.createDocument
- Internet Explorer 5.5 and 6.0
-
using ActiveX
Microsoft.XMLDOM
orMSXML2.DOMDocument
- Konqueror 3.3
-
using DOM 2
document.implementation.createDocument
- Opera 7.54 (without Java)
-
using experimental iframe implementation which is much slower than other methods.
If searching doesn't work for your combination of operating system and browser, please open html/test.html
file and wait a while. It will search sample file included with distribution and report results. Reports with included test debugging are welcomed.
jsFind methods
jsFind
is mode implementing methods which you, the user, are going to use to create indexes.
new
Create new tree. Arguments are B
which is maximum numbers of keys in each node and optional Root
node. Each root node may have child nodes.
All nodes are objects from jsFind::Node
.
my $t = new jsFind(B => 4);
B_search
Search, insert, append or replace data in B-Tree
$t->B_search(
Key => 'key value',
Data => { "path" => {
"t" => "title of document",
"f" => 99,
},
},
Insert => 1,
Append => 1,
);
Semantics:
If key not found, insert it iff Insert
argument is present.
If key is found, replace existing data iff Replace
argument is present or add new datum to existing iff Append
argument is present.
B
Return B (maximum number of keys)
my $max_size = $t->B;
root
Returns root node
my $root = $t->root;
node_overfull
Returns if node is overfull
if ($node->node_overfull) { something }
to_string
Returns your tree as formatted string.
my $text = $root->to_string;
Mostly usefull for debugging as output leaves much to be desired.
to_dot
Create Graphviz graph of your tree
my $dot_graph = $root->to_dot;
to_jsfind
Create xml index files for jsFind. This should be called after your B-Tree has been filled with data.
$root->to_jsfind(
dir => '/full/path/to/index/dir/',
data_codepage => 'ISO-8859-2',
index_codepage => 'UTF-8',
output_filter => sub {
my $t = shift || return;
$t =~ s/è/e/;
}
);
All options except dir
are optional.
Returns number of nodes in created tree.
Options:
- dir
-
Full path to directory for index (which will be created if needed).
- data_codepage
-
If your imput data isn't in
ISO-8859-1
encoding, you will have to specify this option. - index_codepage
-
If your index encoding is not
UTF-8
use this option.If you are not using supplied JavaScript search code, or your browser is terribly broken and thinks that index shouldn't be in UTF-8 encoding, use this option to specify encoding for created XML index.
- output_filter
-
this is just draft of documentation for option which is not implemented!
Code ref to sub which can do modifications on resulting XML file for node. Encoding of this data will be in index_codepage and you have to take care not to break XML structure. Calling xmllint on your result index (like
t/90xmllint.t
does in this distribution) is a good idea after using this option.This option is also right place to plug in unaccenting function using Text::Unaccent.
_recode
This is internal function to recode charset.
It will also try to decode entities in data using HTML::Entities.
jsFind::Node methods
Each node has k
key-data pairs, with B
<= k
<= 2B
, and each has k+1
subnodes, which might be null.
The node is a blessed reference to a list with three elements:
($keylist, $datalist, $subnodelist)
each is a reference to a list list.
The null node is represented by a blessed reference to an empty list.
new
Create New node
my $node = new jsFind::Node ($keylist, $datalist, $subnodelist);
You can also mit argument list to create empty node.
my $empty_node = new jsFind::Node;
locate_key
Locate key in node using linear search. This should probably be replaced by binary search for better performance.
my ($found, $index) = $node->locate_key($key, $cmp_coderef);
Argument $cmp_coderef
is optional reference to custom comparison operator.
Returns (1, $index) if $key[$index] eq $key.
Returns (0, $index) if key could be found in $subnode[$index].
In scalar context, just returns 1 or 0.
emptynode
Creates new empty node
$node = $root->emptynode;
$new_node = $node->emptynode;
is_empty
Test if node is empty
if ($node->is_empty) { something }
key
Return $i
th key from node
my $key = $node->key($i);
data
Return $i
th data from node
my $data = $node->data($i);
kdp_replace
Set key data pair for $i
th element in node
$node->kdp_replace($i, "key value" => {
"data key 1" => "data value 1",
"data key 2" => "data value 2",
};
kdp_insert
Insert key/data pair in tree
$node->kdp_insert("key value" => "data value");
No return value.
kdp_append
Adds new data keys and values to $i
th element in node
$node->kdp_append($i, "key value" => {
"added data key" => "added data value",
};
subnode
Set new or return existing subnode
# return 4th subnode
my $my_node = $node->subnode(4);
# create new subnode 5 from $my_node
$node->subnode(5, $my_node);
is_leaf
Test if node is leaf
if ($node->is_leaf) { something }
size
Return number of keys in the node
my $nr = $node->size;
halves
Split node into two halves so that keys 0 .. $n-1
are in one node and keys $n+1 ... $size
are in the other.
my ($left_node, $right_node, $kdp) = $node->halves($n);
to_string
Dumps tree as string
my $str = $root->to_string;
to_dot
Recursivly walk nodes of tree
to_xml
Escape <, >, & and ", and to produce valid XML
base_x
Convert number to base x (used for jsFind index filenames).
my $n = $tree->base_x(50);
to_jsfind
Create jsFind xml files
my $nr=$tree->to_jsfind('/path/to/index','0');
Returns number of elements created
SEE ALSO
jsFind web site http://www.elucidsoft.net/projects/jsfind/
B-Trees in perl web site http://perl.plover.com/BTree/
This module web site http://www.rot13.org/~dpavlin/jsFind.html
AUTHORS
Mark-Jonson Dominus <mjd@pobox.com> wrote BTree.pm
which was base for this module
Shawn P. Garbett <shawn@elucidsoft.net> wrote jsFind
Dobrica Pavlinusic <dpavlin@rot13.org> wrote this module
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Dobrica Pavlinusic
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.