
上QQ阅读APP看书,第一时间看更新
Chapter 2. Finding and Working with Words
In this chapter, we cover the following recipes:
- Introduction to tokenizer factories – finding words in a character stream
- Combining tokenizers – lowercase tokenizer
- Combining tokenizers – stop word tokenizers
- Using Lucene/Solr tokenizers
- Using Lucene/Solr tokenizers with LingPipe
- Evaluating tokenizers with unit tests
- Modifying tokenizer factories
- Finding words for languages without white spaces