You would need to provide at least 8 images for each model: Two different documents of the same type, for each of them, 2 front pictures and 2 back pictures.

Training Data Set Requirements

Please make sure that the training data set for a new document comply with the requirements below.  

  • 8 images are needed to train one document (with two sides):
    • 2 images of each side of the same document (4 in total for two sides of the ID)
    • 2 sets of images of the document from two different people (pictures of two physical copies of the same type of a document)
  • Image quality requirements (if not met the training will not be successful)
    • At least 3000*2000 pixels resolution 
    • The document should take up majority of the space on the image
    • The document, or any of it's part is not covered by any object - paper, fingers ...
    • The document is sharp, not blurred, no glares, all fields are clearly readable
    • The whole document is visible on the image - all 4 corners of the document are visible
    • Minimum height of the smallest letter is at least 12 px for each of the fields
    • All fields are clearly legible by human (it is possible to distinguish even the most similar letters and diacritical marks without doubts)
    • The document is pictured directly from above (or scanned on a high resolution scanner), not under any angle
  • Physical requirements for the documents
    • The document is of a rectangular shape with sharp or only slightly round corners (think credit cards or passports)..

    • The size of the document does not exceed the size of an ISO 216 international standard paper size (A6, 105×148 mm), with the ratio of the sides ranging from 1:1 to 5:2.

    • The document is created from a solid material, or otherwise the users are instructed not to allow bending of the document while taking the image (e.g. by laying the document against a flat surface)

    • The character set of every field that should be processed is extended latin - including diacritics and common special characters.

    • The content of the fields is not moving around (filled in on a typewriter, custom printed on a standard office printer, glued in photographs etc)