It wouldn't be hard to store the inverted index in an ordinary file and
grep through it for the words we want. Since the index would be
smaller than the site, this would be an improvement over using
find and
grep to search the whole site.
For large sites, however, even the index file can be huge, and
reading the whole file on every search just to find a few words would
be rather wasteful. So, let's use a DBM file. It so happens that
both the table of Web pages and the inverted index consist entirely of
(name => value) records, which makes it quite easy to map
them to strings in the DBM file. In Perl's notation, our example
index would now look like this:
%dbm = (
'-1' => '<a href="/one.html">Doc One</a>',
'-2' => '<a href="/two.html">Doc Two</a>',
'another' => '-2',
'doc' => '-1-2',
'document' => '-1-2',
'here' => '-2',
'is' => '-1-2',
'one' => '-1',
'this' => '-1',
'two' => '-2'
);
(I've switched to using negative numbers so any numbers that appear
as words within Web pages won't conflict with the document index.)
next page»