Within the ICDAR 2021 Competition on Historical Document Classification, we created our own datasets splits. In the links below you will find the annotation files we used. The script (handwriting type) dataset provided for this competition consisted of two previously published datasets, for which we unified ground-truth.

Individual lines of every file contain annotations for a document (page) in the dataset. Each line starts with the name of the document and the following annotations are separated with spaces. In the case of dating, the annotations are in "not-before not-after" format, which defines the interval of years in which the document was created. In localization, the annotation for each document is the name of the place where the document originated. To determine the font and script, the main font/script is first annotated, followed by a list of others that appear in the document.

Download links

Examples of annotations

Task Annotation format
Font & script

IRHT_P_001909.tif Semihybrida Textualis

Location

b27efa208a95f5da6b877bec7608e23a.jpg Fonteney

Date

3545_1100_1199.jpg 1100.0 1199.0