- Create a file with name seedlistcrawler_ext.xml which contains the following:
<?xml version="1.0" encoding="UTF-8"?> <ExtendedProperties><AppendChild Xpath="/Crawler/DataSources/Server" Name="HttpTraceSeedlist">/tmp/seed.log</AppendChild> </ExtendedProperties>
- Put the file into ES_NODE_ROOT/master_config/<Crawler ID>/
- Restart the crawler session, and then perform full crawl.
- All HTTP activities regarding seedlist should be logged in the specified file (/tmp/seed.log ). It will contain dumps of all pages.
No comments:
Post a Comment