When web crawling to gather HTML documents for indexing, Verity Spider looks for the date the document was last modified, in the form of a field named Last-Modified. The value of Last-Modified is used to determine if documents should be indexed again.
How Last-Modified is Used
For HTML documents which have been indexed, and for which a value exists in the last_modified_date field in the persistent store, Verity Spider compares the retrieved document's Last-Modified value with last_modified_date. What happens to the document depends on the outcome of the comparison.
last_modified_date field is updated in the persistent store.
last_modified_date value for documents already in a collection's persistent store by running the report utility, vsdb, with the -date option. For more information, see "Verity Spider Reporting" earlier in this chapter.
New Documents
For HTML documents that have never been indexed, the value for Last-Modified is irrelevant. The Last-Modified date, if it exists, is stored in the last_modified_date field of the persistent store for the collection into which the document is being indexed. -refreshtime to ensure that documents just recently indexed are not indexed again. For more information, see Chapter 2, "Verity Spider Reference."
Dynamic Documents
If you are dealing with dynamically generated HTML documents, then there may never be a Last-Modified date and so the document may always be indexed. A workaround is to incorporate a meta tag into the processing of the dynamic documents and take advantage of the -metafile option. See "Using a Custom Last-Modified Value" below.
How Last-Modified is Determined
When indexing web sites, Verity Spider reads the Last-Modifed HTTP header field. The value of Last-Modifed is normally provided by the web server from which the document is served. In some cases, though, the web server may be configured to not provide a Last-Modifed value. Verity Spider can only recognize the standard HTTP header field, in the form of Last-Modifed.
Using a Custom Last-Modified Value
There are basically two scenarios which involve using a custom value for Last-Modified. These are:
Overriding an Existing Last-Modified Value
When indexing web sites, you may want to use your own date/time values to specify when a document was last indexed. Review the "Example" below and keep in mind that you must specify the "Y" override flag in your map file to ensure that your value is always used.
Example
To incorporate a custom value for Last-Modified when indexing web sites, do the following:
-refreshtime and -refresh with inclusion or exclusion criteria. For more information on these options, see "Reference of Command-line Options" in Chapter 2.
-metafile option.