Issue:
The function IFACE_CreateTemplateBatch is falling over as soon as we send more than 42 faces and our application is forced to stop. We have 4 GPUs and need to detect one or more faces from the data set of 500 images and this function is claimed to be up to 20 times faster. Are we using it correctly?
Cause:
The context of use of IFACE_CreateTemplateBatch is to process an array of face entities extracted from a multi-people (i.e. multi-face) image. I suppose you are extracting faces from single portraits and then passing it to the function together in the array Face[].
The main problem with this approach is that when you are trying to process 500 of your portraits (280x350) all at once, it is equivalent to a 49MPx image. According to approximate measurings, your 100-image batch takes 4.8GB RAM, 200-images batch takes 9.5GB RAM, and finally, 500-faces batch takes about 20GB of RAM. So the most probable cause of the problem is insufficient RAM (GDDR) memory of your graphic card.
Solution:
If you have 4 GPUs, I would suggest you divide dataset to 4 parts, create 4 IFace instances and process each this part on another GPU, using face batches as large, as the GPU can process (individually, or basing on the weakest one). The most recommended batch sizes are 32, 48 and 64, but it depends on the image size and memory capabilities.