Restarting an Interrupted Job


You begin spidering a single web site where you want to gather pages in order to only index linked Adobe Acrobat PDF documents which reside only on that host. You do not want to index the parent files. Due to a corrupt document, the spidering job fails. You then restart the indexing job.

Original indexing command:

vspider -cmdfile /verity/vspider/htmlpdf.cmd

where htmlpdf.cmd consists of:

-collection icd.coll
-start http://www.website.com
-indmimeinclude application/pdf

Reissued indexing command from the command-line:

% vspider -collection icd.coll -restart -host www.website.com -indmimeinclude application/pdf

Case-specific Options

Option
Reason
-restart
This option indicates that the should read the persistent store for the specified collection and continue following and indexing only those documents which were not previously followed and indexed .
-host
This option forces to gather only from the specified host, rather than following links which may lead elsewhere. When you use -restart, you must use at least one of -host, -domain, -nofollow, or -unlimited.

Unnecessary Options for this Case

Option
Reason
-start, -resync, -refresh
When you use the -restart option, you cannot use the -start, -resync or -refresh options.
-indmimeexclude
When you use inclusion criteria, it is implied that all other criteria are excluded.





Copyright © 1998, Verity, Inc. All rights reserved.