Brno Mobile OCR Dataset (B-MOD) is a collection of 2 113 templates (pages of scientific papers). Those templates were captured using 23 various mobile devices under unrestricted conditions ensuring that the obtained photographs contain various amount of blurriness, illumination etc. In total, the dataset contains 19 725 photographs and more than 500k text lines with precise transcriptions. The template pages are divided into three subsets (training, validation and testing).
This dataset may be used for non-commercial research purpose only. If you publish material based on this dataset, we request you to include a reference to the paper:
M. Kišš, M. Hradiš, and O. Kodym, “Brno Mobile OCR Dataset” in 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019.
You can download the dataset and evaluate your OCR system below. Our OCR system is available on the github. If you have any question, please contact firstname.lastname@example.org or email@example.com.
- Original page templates (1.45 GB)
- Photographs and associated PAGE XML annotations, parts 1, 2, 3, 4 (31.62 GB total)
- Rectified photographs and associated PAGE XML annotations, parts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 (89.65 GB total)
- Cropped text lines with transcriptions (5.29 GB)
- List of photographs with used device name (947 kB)
You can evaluate your OCR system using the form below. Fill your name or name of your team to identify your results. Please, enter a short description of your system or a link to the description.
Please, upload a single text file where each line corresponds to one transcribed line of the test set with the same formatting as in the text files for training and validation lines in the "Cropped lines with transcriptions" ZIP archive. The formating must follow pattern:
6149958838f466bbb508399a83bbeb5c.jpg_rec_l0004.jpg Theorems 1 and 2 show that, in checking for deadlock or
Please, select a file.
Please, fill your name or name of your team.
|Thales of Miletus||Baseline CRNN (Random splitting)||10.09.2019||0.07||0.37||1.39||6.04||14.73||39.83||1.03||3.61|
|Michal Hradis||Original CTC LSTM network from the paper decoded using beam search and dictionary. The dictionary is generated from the train/val dataset splits. Implementation from https://github.com/githubharald/CTCWordBeamSearch.||10.09.2019||1.70||6.99||5.46||16.19||33.37||60.42||4.05||11.81|
|Tesseract||https://github.com/tesseract-ocr/tesseract config = ("-l eng --oem 1 --psm 7")||12.09.2019||12.32||24.21||45.00||71.06||79.17||100.87||24.47||40.91|
|Thales of Miletus - 1||Baseline CRNN (Random splitting) + WordBeamSearch||13.09.2019||0.06||0.37||1.38||6.00||14.64||39.50||1.03||3.58|
|Attention Conv-LSTM||Fairly stanndard seq2seq Conv-LSTM model with attention.||16.09.2019||0.70||1.23||3.97||10.66||20.19||47.82||2.42||5.84|
|Sayan Mandal StaquResearch||customCNN_LSTM_CTC. No augmentation or LM.||13.11.2019||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A|