What does it mean when we get low OcrConfidenceTexts? :

Question

What does it mean when we receive a low OCR confidence text? Has a text field been read incorrectly?

Answer

Low score of OCR confidence does not necessarily mean that the text field was read incorrectly. It only indicates that it might be incorrect.
There are several recommended approaches how to work with this information. It depends from the document content itself. It is easier if document contains more sources (visual zone, MRZ, barcode or NFC).

1. If there are more sources (for example visual + MRZ), you can use cross-checks and apply following logic:
if "/document/inspect" returns low confidence for some text field, you can cross-check values of this field from for example visual and MRZ source.

If they match, ignore low confidence OCR.

If they don't match, most probably OCR was inaccurate.

2. If there are not more sources for some text field, you can apply your own rules, if that text field has significant value for you. For example gender field is hard to misread as it is mostly M/F value. If there is a low OCR confidence, most probably it wont be read incorrectly. You can ignore low confidence for such selected fields.

Additional Notes

If needed, you can lower the threshold for OCR confidence in your <application.yml> file.

Relevant Product / Version

DIS server, all versions

How can we help you today?