Brno Mobile OCR Dataset (B-MOD) is a collection of 2 113 templates (pages of scientific papers). Those templates were captured using 23 various mobile devices under unrestricted conditions ensuring that the obtained photographs contain various amount of blurriness, illumination etc. In total, the dataset contains 19 725 photographs and more than 500k text lines with precise transcriptions. The template pages are divided into three subsets (training, validation and testing).

This dataset may be used for non-commercial research purpose only. If you publish material based on this dataset, we request you to include a reference to the paper:

M. Kišš, M. Hradiš, and O. Kodym, “Brno Mobile OCR Dataset” in 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019.

You can download the dataset and evaluate your OCR system below.




You can evaluate your OCR system using the form below. Fill your name or name of your team to identify your results. Please, enter a short description of your system or a link to the description.

Please, upload a single text file where each line corresponds to one transcribed line of the test set with the same formatting as in the text files for training and validation lines in the "Cropped lines with transcriptions" ZIP archive. The formating must follow pattern:

filename transcription


6149958838f466bbb508399a83bbeb5c.jpg_rec_l0004.jpg Theorems 1 and 2 show that, in checking for deadlock or



Name Description Date Easy Medium Hard Overall
Baseline LSTM CNN_LSTM_CTC 30.06.2019 0.33 1.93 5.65 22.39 32.28 72.63 3.15 10.71
Baseline Conv CNN_CTC 30.06.2019 0.50 2.79 7.82 28.50 39.76 80.69 4.19 13.39
