This filter uses the optical character recognition plugin to extract text content from images and drawings. As a requirement the OCR plugin has to be active and the required language files have to be installed. For further details, please refer to the OCR plugin
Optical character recognition often has some recognition errors due to small fonts, poor scanning, noise by background images or even ambiguous characters. To overcame these errors a tolerance level can be defined.
Name | Beschreibung |
---|---|
FILTERS | Add 'OCR' to the comma separated list to enable. The default value is disabled |
NORMALIZATION_LEVEL | 0 - None - compare all characters as recognized (not recommended) |
1 - Similar characters only - tolerate errors on characters with the same appearance, like a Latin 'a' and a Russian 'а'. A full of tolerated characters can be found here http://www.unicode.org/reports/tr36/confusables.txt |
|
2 - Common recognition errors - tolerate errors in characters with similar appearance especially on noisy background. This tolerance is based on experience and testing as there is no public recommendation. An example would be the German sharp s ß and the upper case letter B that are very similar in some fonts. |
|
3 - Common recognition errors caused by distortion - same as 'Common recognition errors' but extended for slightly rotated or distorted images. Such distortions are usually happen when scanning pages. |