Reparsing a Site


Suppose you index only HTML documents on a site.

vspider -cmdfile /verity/spider/allhtml.cmd

where allhtml.cmd consists of:

-collection allhtml.coll
-start http://www.mysite.com
-mimeinclude text/html

Now you want to update the collection with all of the other document types linked to on those HTML pages.

NOTE: The following command must be issued as a single line from the command-line. It is broken up here for readability.

vspider -collection allhtml.coll
-start http://www.mysite.com
-reparse

Case-specific Options

Option
Reason
-reparse
Forces to crawl the HTML documents again, indexing any documents which are allowed.

Unnecessary Options for this Case

Option
Reason
-mimeinclude
In order for to have anything to do when you use -reparse, you must either omit previous exclusion criteria, or introduce new inclusion criteria. In this case, specifying only HTML in the original job excluded all other file types. By omitting -mimeinclude in the second job with -reparse, you will index all document types to which there are links in the HTMl documents.





Copyright © 1998, Verity, Inc. All rights reserved.