Within the ICDAR 2021 Competition on Historical Document Classification, we created our own datasets splits. In the links below you will find the annotation files we used. The script (handwriting type) dataset provided for this competition consisted of two previously published datasets, for which we unified ground-truth.
Individual lines of every file contain annotations for a document (page) in the dataset. Each line starts with the name of the document and the following annotations are separated with spaces. In the case of dating, the annotations are in "not-before not-after" format, which defines the interval of years in which the document was created. In localization, the annotation for each document is the name of the place where the document originated. To determine the font and script, the main font/script is first annotated, followed by a list of others that appear in the document.
- Font: font.trn (2.1 MB), font.val (13 KB)
- Script: script.trn (232 KB), script.val (14 KB)
- Location: location.trn (250 KB), location.val (3.1 KB)
- Date: date.trn (325 KB), date.val (32 KB)
Examples of annotations
|Font & script||
IRHT_P_001909.tif Semihybrida Textualis
3545_1100_1199.jpg 1100.0 1199.0