How much memory is needed for Tensorflow Serving?

S

Sergey Sokolov2020-02-05 18:50:36

Machine learning

Sergey Sokolov, 2020-02-05 18:50:36

What server config is optimal for production service of one model via Tensorflow Serving?

As I understand it, the engine itself adjusts to the existing config: how many CPU cores, how many parallel processing.

But here is the image classifier model, and you need to understand which server to take so that a maximum of two requests are processed simultaneously, no more than 500ms per response.

The model weighs 76Mb, although it hardly matters.

While I was running in test mode on a minimal DigitalOcean droplet with 512Mb RAM, single requests were processed in 4-6 seconds, but swap was involved and with several requests in a row / simultaneously, it became bad. Not many more services are spinning that are not in demand.

The working server will be with GPU.