Projekt PERO

Within the ICDAR 2021 Competition on Historical Document Classification, we created our own datasets splits. In the links below you will find the annotation files we used. The script (handwriting type) dataset provided for this competition consisted of two previously published datasets, for which we unified ground-truth.

Individual lines of every file contain annotations for a document (page) in the dataset. Each line starts with the name of the document and the following annotations are separated with spaces. In the case of dating, the annotations are in "not-before not-after" format, which defines the interval of years in which the document was created. In localization, the annotation for each document is the name of the place where the document originated. To determine the font and script, the main font/script is first annotated, followed by a list of others that appear in the document.

Download links

Font: font.trn (2.1 MB), font.val (13 KB)
Script: script.trn (232 KB), script.val (14 KB)
Location: location.trn (250 KB), location.val (3.1 KB)
Date: date.trn (325 KB), date.val (32 KB)

Examples of annotations

Task	Annotation format
Font & script	IRHT_P_001909.tif Semihybrida Textualis
Location	b27efa208a95f5da6b877bec7608e23a.jpg Fonteney
Date	3545_1100_1199.jpg 1100.0 1199.0