Slovak Tokenizer

The purpose of this tool is to normalize any text to a form appropriate for natural language processing.

Requirements

Getting started

  • Get and extact the sources
  • Make build directory and switch to it
  • Run CMake to generate build script
  • Run your compiler to build the project
  • The program accepts text on standard input and prints the result on standard output

Source Code

Bibliography

If you use this tool, please cite our papper on Slovak Categorized News Corpus