Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing