Natural Language Processing with Java and LingPipe Cookbook
上QQ阅读APP看书,第一时间看更新

Chapter 2. Finding and Working with Words

In this chapter, we cover the following recipes:

  • Introduction to tokenizer factories – finding words in a character stream
  • Combining tokenizers – lowercase tokenizer
  • Combining tokenizers – stop word tokenizers
  • Using Lucene/Solr tokenizers
  • Using Lucene/Solr tokenizers with LingPipe
  • Evaluating tokenizers with unit tests
  • Modifying tokenizer factories
  • Finding words for languages without white spaces