By default, the Verity Spider is not limited from following links (during web crawling) or walking through directory structures (during directory walking). Web crawling starts at a specified URL and follows the links without respect to URL-implied directory hierarchy. Directory walking starts at a named directory and walks through the subdirectories it finds.
During Web Crawling
Using the -include option, you can limit the Verity Spider's web crawling behavior. For example, if you start the indexing task at a URL without limitation, the Verity Spider will follow links anywhere, including to a location "above" the starting directory. For example, from this starting URL:
- vspider -collection mycoll.col
-start http://www.some.site.com/region2/sales
-include '*/region2/sales/*'
-indinclude. For more information, see Chapter 2, "Verity Spider Reference."
During Directory Walking
Using the -prunedir option, you can specify directories to be skipped by the Verity Spider when it does directory walking. This option takes one or more regular expression patterns.C-shell style regular expressions, not grep style. For example, the following is a valid expression:
-prunedir expressions, as follows: