Slovak Financial Exam

This dataset contains 1,334 multiple-choice questions from the financial domain in the Slovak language. It was created to address the limited availability of language resources for Slovak, providing a benchmark for evaluating language models' capabilities in a specialized, low-resource domain.

The questions are sourced from the official certification exams for financial advisors in Slovakia, covering a range of topics including insurance, capital markets, deposits, loans, and pensions. Each question is presented with five possible answers, only one of which is correct.

Dataset Structure

Data Instances

A typical data instance looks like this:

{
  "id": 1,
  "type": "multiple-choice",
  "prompt": "Blízkou osobou v priamom rade je:",
  "level": "1",
  "area": "General questions",
  "sector": "Všeobecná časť",
  "label": 3,
  "answers": [
    "Bratranec",
    "Otcov brat",
    "Druh-družka",
    "Syn",
    "Neter"
  ]
}

Data Fields

  • id: A 64-bit integer identifying the question.
  • type: The type of question, e.g., 'multiple-choice'.
  • prompt: The question prompt text in Slovak.
  • level: A string representing the difficulty level of the question (e.g., "1", "2", "3").
  • area: The English translation of the category name.
  • sector: The name of the category in Slovak.
  • label: A 64-bit integer representing the 0-based index of the correct answer in the answers list.
  • answers: A list of strings representing the possible answers.

Data Splits

The dataset consists of a single split, test, containing all 1,334 examples. It is intended for evaluation purposes.

Dataset Creation

The dataset was created by parsing publicly available questions from the financial advisor certification exams administered by the National Bank of Slovakia (NBS). The source material was valid until August 5, 2023. The authors extracted the questions, answers, and associated metadata, such as the topic category and difficulty level, to create a structured dataset for NLP model evaluation.

Licensing Information

license: cc-by-nc-sa-4.0