Slovak Tokenizer

The purpose of this tool is to normalize any text to a form appropriate for natural language processing.


Getting started

  • Get and extact the sources
  • Make build directory and switch to it
  • Run CMake to generate build script
  • Run your compiler to build the project
  • The program accepts text on standard input and prints the result on standard output

Source Code


If you use this tool, please cite our papper on Slovak Categorized News Corpus