Now that we have an index, it's time to come up with a way for
users to access it. I'm going to implement a simple search that
finds only the pages that contain every word the user typed in
the form.
The search form is simple enough:
<form action="/search.cgi">
<p><input name=s><input type=submit value="Search">
</form>
Search.cgi reads the form variable and parses it into words:
my $query = $ENV{'QUERY_STRING'};
$query =~ s/s=//;
$query =~ s/%[0-9a-fA-F]{2}/ /g;
my @words = ($query =~ /\w+/g);
Next, it opens the DBM file containing the inverted index:
use DB_File;
dbmopen(%db,"search_index.db",0);
Our strategy for implementing the query is to keep a counter for each
relevant document. We search each word in turn and increment the
document's counter when a word is found.
my %counters;
my $word;
for $word (@words) {
my $pages = $db{lc $word};
my $page;
for $page ($pages =~ /(-\d+)/g) {
$counters{$page}++;
}
}
A document that contains every word will have its counter incremented each
time through the loop, so its count will be equal to the number of
words. The following script will find those documents and print them:
for $page (sort keys %counters) {
if($counters{$page}==scalar(@words)) {
my $href = $db{$page};
print "$href<br>";
}
}
And that does it.
Of course, there are many options that could be added to
this little search engine to make it friendlier, but they're just a
matter of programming.