Thursday, 25 September 2014

WebSphere Commerce static crawler configuration tips

A simple way for for executing crawler.sh is to use the syntax below:

crawler.sh -cfg /usr/WebSphere/AppServer70/profiles/search/solr/home/droidConfig.xml -instance <instancename> 

Which will do the following:

  • Crawl your static content starting from home URL as defined by location in droidConfig.xml
  • Invoke a delta index by pushing documents directly to index if autoindex is enabled (again as defined in droidConfig.xml

This works perfectly but it has a major drawback. If your removing existing static content html files, they will not be cleaned up from index and you will end up with search results which are not valid anymore and customer will get the famous 404 if tried to click on any one of them.

In such case, it might make more sense to use the steps below:

crawler.sh -cfg /usr/WebSphere/AppServer70/profiles/search/solr/home/droidConfig.xml -instance <instancename> -dbuser <dbuser> -dbuserpwd <password> -dbhost <db2_hostname> -dbname <databasename> -dbport 50000 -dbtype db2

/di-buildindex.sh -instance <wcs_instance> -masterCatalogId <catalogId>
-indexSubType WebContent -dbuser <dbuser> -dbuserpwd <password> -fullbuild true -statusInterval <interval> -localename <lang> -force true -webcontentDelete true

di-buildindex.sh -instance <wcs_instance> -masterCatalogId <instancename>
-indexSubType WebContent -dbuser <dbuser> -dbuserpwd <password> -fullbuild true -statusInterval <interval> -localename <lang>-force true
What will happen in such case ?

  • The first command crawler.sh if configured with databse configuration paramters, it will attempt to connect to the database and update SRCHCONFEXT table where indexsubtype equals WebContent record and set the location of the newly crawled files (in column config)
  • Running di-buildindex.sh with  -webcontentDelete set to true will force cleaning up index content
  • The running di-buildindex.sh again without this option (it is false by default) will build a clean index.
Remarks:
  • The process is expected to happen on staging and index is propagated to production so cleaning up index shouldn't affect your production data
  • You can mix between full index cleanup and delta index updates by setting autoIndex to true. In such case, you will have more freedom in setting a different schedule for WebContent index update.


No comments:

Post a Comment