Web Sites and Proxy Servers


You want to index only static documents on both an internal and an external web site. The external web site must be accessed through a proxy server that requires authentication, and the internal web site can be accessed without a proxy server.

vspider -cmdfile /verity/vspider/proxy.cmd

where proxy.cmd consists of:

-collection icd.coll
-start http://host.verity.com:8015 -start http://www.company.com
-noproxy `*.verity.com'
-proxy proxyhost:8080
-proxyauth jcameron:1912sunk
-timeout 10 -jumps 20 -pathlen 8 -indexers 4 -connections 10

Case-specific Options

Option
Reason
-noproxy
Since you know that the internal site can be accessed without a proxy server, you can optimize the indexing job by explicitly instructing to not attempt to use a proxy server. Note that the argument value must be enclosed in single quotes because it contains a wildcard character (*).
-proxy
Since you know that the external site can only be accessed through a proxy server, you explicitly instruct to use the indicated host and port.
-proxyauth
In order to get through the secure proxy server, that requires authentication, specified with -proxy, you must include a username and password with -proxyauth.

Unnecessary Options for this Case

Option
Reason
-cgiok
You only want to index static documents, so you would not include -cgiok which would index documents served by CGI scripts.
-abspath, -prefixmap
These options only affect indexing file systems.





Copyright © 1998, Verity, Inc. All rights reserved.