Web service monitoring?

M

Maxim2017-11-02 12:39:42

Monitoring

Maxim, 2017-11-02 12:39:42

Hello.
I have a general question about monitoring web services. For me, this is a new thing, so there are questions. The question is not about tools for monitoring, but about the approach.
Web services are multi-layered. We have a web server, an application(s), a database, caching services, and so on and so forth.
How to monitor each of the components is understandable. We take a convenient tool - set up triggers and alerts. We receive notifications. But in this case, the monitoring is flat. I don't have any relationship. When fakap begins to fall a bunch of alerts. The root of the problem in this case is difficult to understand.
So, my question is this. How to watch (monitor) your web service so that you can see it as the user sees it.
Common things come to mind.
We have basic graphs - the big picture: ratio RPS + non-responses ( 4xx, 5xx ).
API response time.
Share your experience. What other de facto standards exist to define how a service's liveness works.
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

P

Philipp, 2017-11-02
@zoonman

There is such a cool thing as Sentry. It allows you to track almost any errors, both on the backend and on the frontend.
And she also knows how to set a user context, i.e. you can track which users have the error. This is extremely convenient.
Usually, when the backend crashes, something falls off on the frontend too.
Sentry also knows how to breadcrumbs, i.e. you can independently track the chain of user actions until an error occurs. Of course, this requires code modification, but the result is just wonderful.
Sorry I started with a tool, but figuring out any problems starts with symptom/error analysis. If you don't have enough bug data, or the bugs themselves aren't being tracked, then there will be problems.
If you have problems with server response time, then you need to monitor and profile requests.
For example, you suddenly have a bounce rate on a page. To do this, you can set up an alert in Google Analytics for a sharp increase in bounces. Next, you look at monitoring server responses used on this page. Then you get that one of the API calls is taking longer than usual. Profile it. See that there is a long call to the database. See the monitoring of long queries to the database and correlate with the queries used in this API. Find a query, do an EXPLAIN, complete the indexes or refactor the API. Most of all these procedures require intelligence and experience. And something like NewRelic can help you go through all this at once.