Projekt PERO

19 manuscripts in various European languages and scripts.

More information together with adaptation fine-tuning experiments of a general model trained on a large handwriting dataset can be found here: Finetuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition

There are two directories in the dataset archive: data and runs.

data contains images of text lines and their respective transcriptions. The images are in three multiple crop modes: tight, medium, and wide, the crop mode indicates how much space was left around the baseline during the cropping process. Transcriptions are in the following format: ID TRANS, where the ID corresponds to the name of the respective text line image and TRANS is the transcription.

runs contains partitions for fine-tuning runs, more information in referenced paper, Section 5.

Download links

The dataset is available on Zenodo.