Project PERO is a cooperation between Brno University of Technology and Moravian library and is financed by Czech Ministry of Culture as an applied research project focused on cultural and national identity (NAKI II). The results of the project are applied in the PERO-OCR Application.
The project aims to create technology and tools which would improve accessibility of digitized historic documents. These tools, based on state of the art methods from computer vision, machine learning and language modeling, will enable existing digital archives and libraries to provide full-text search and content extraction for low quality historic printed and all hand written documents - which can not be automatically processed by the currently available tools.
The project extends automation and capabilities of digitization pipeline by providing tools for automated quality assessment and control, quality improvement, automated text transcription of historic printed documents, semi-automated hand written text transcription, and automatic extraction of semantic information from semi-structured documents (e.g. library catalogs and birth records). The created tools and techniques will be validated by processing selected collections of digitized materials and by a pilot operation by cooperation with Moravian Library.