Answer the question
In order to leave comments, you need to log in
How much memory is needed for Tensorflow Serving?
What server config is optimal for production service of one model via Tensorflow Serving?
As I understand it, the engine itself adjusts to the existing config: how many CPU cores, how many parallel processing.
But here is the image classifier model, and you need to understand which server to take so that a maximum of two requests are processed simultaneously, no more than 500ms per response.
The model weighs 76Mb, although it hardly matters.
While I was running in test mode on a minimal DigitalOcean droplet with 512Mb RAM, single requests were processed in 4-6 seconds, but swap was involved and with several requests in a row / simultaneously, it became bad. Not many more services are spinning that are not in demand.
The working server will be with GPU.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question