Describe Validation of crawler traps or regular expressions here.

When you add 1 or more "regexp" (Regular expression) crawler traps in the NAS GUI in the domain screen or in Global Crawler Traps GUI, they are NOT validated for wrong syntax.

A NOT valid regexp string can stop all harvesting activity in the system, if you by mistake add it to the Global Crawler Trap list. You get typically the Heritrix log error message :" 2011-03-01T11:39:10.400Z -5 - http://www.sf.dk/Default.aspx - - no-type #049 - - - err=java.util.regex.PatternSyntaxException"

The only place where your regexp's are validated before they are inserted into NAS is in "Edit Harvest Templates" menu, when you upload an order.xml with regexp lists.

So please check your regexp strings on your test system before you upload them to the production system.

You can e.g. also check your regexp strings here : http://www.javaregex.com/test.html

And here is more about "regexp"'s : http://en.wikipedia.org/wiki/Regular_expression

Best regards Tue