Using Lucene/Solr tokenizers with LingPipe