This corpus aims to be the first attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing.
It contains a selection of news articles, processed by our NLP tools.
The second part of the effort is the information retrieval evaluation set for the corpus.
This is the first Slovak information retrieval evaluation set. It contains a set of queries (information need) together with corresponding relevant documents from the Slovak Categorized News Corpus.
Please write a request on daniel.hladek@tuke.sk for download link.
D. Hládek, J. Staš, J. Juhár: Slovak Categorized News Corpus, LREC 2014 PDF poster
Hládek, Daniel, Ján Staš, and Jozef Juhár. "Evaluation Set for Slovak News Information Retrieval." LREC. 2016. PDF