Invalid Domain

You are here :

Roll Your Own Search Engine

Page 4 — Building the Search CGI

Now that we have an index, it's time to come up with a way for users to access it. I'm going to implement a simple search that finds only the pages that contain every word the user typed in the form.

The search form is simple enough:

  <form action="/search.cgi">
  <p><input name=s><input type=submit value="Search">
  </form>

Search.cgi reads the form variable and parses it into words:

  my $query = $ENV{'QUERY_STRING'};
  $query =~ s/s=//;
  $query =~ s/%[0-9a-fA-F]{2}/ /g;
  my @words = ($query =~ /\w+/g);

Next, it opens the DBM file containing the inverted index:

  use DB_File;
  dbmopen(%db,"search_index.db",0);

Our strategy for implementing the query is to keep a counter for each relevant document. We search each word in turn and increment the document's counter when a word is found.

  my %counters;
  my $word;
  for $word (@words) {
      my $pages = $db{lc $word};
      my $page;
      for $page ($pages =~ /(-\d+)/g) {
          $counters{$page}++;
      }
  }

A document that contains every word will have its counter incremented each time through the loop, so its count will be equal to the number of words. The following script will find those documents and print them:

  for $page (sort keys %counters) {
      if($counters{$page}==scalar(@words)) {
          my $href = $db{$page};
          print "$href<br>";
      }
  }

And that does it. Of course, there are many options that could be added to this little search engine to make it friendlier, but they're just a matter of programming.