Computer-aided text recognition (OCR) is a method of extracting text from pictures and drawings and converting it into a machine-readable format. OCR is often used for faxes or scanned documents. i-net PDFC uses this technology as the basis for various pre-installed filters. The exact description of the individual scenarios is documented with the respective filters.
i-net PDFC's basic OCR plug-in used the open source software Tesseract as its standard for text recognition. The current version of Tesseract uses trained neural networks for recognition and thus offers a very high recognition rate for printed texts. Handwritten texts are not supported.
The configuration specifies which variant of Tesseract is used and which languages are available for text recognition.
In order for Tesseract to deliver the best possible results, a number of characteristics are required.
x
has a height of about 10 pixels.