A
A
Alexey Potapenko2019-10-04 22:43:32
PHP
Alexey Potapenko, 2019-10-04 22:43:32

How to selectively remove obsolete cached results from the database?

Let's say there are methods in the UserRepository class:
findByFirstname($firstname)
findByCity($city)
findByAge($minAge, $maxAge)
Also, let's say there is a PostRepository with a method:
findByUser($id)
findByName($postName)
Let's say all these methods have already been executed and the results of their execution are already in the cache. It should be noted that, for example, if the findByCity() method was executed several times with different parameters, then, accordingly, after each execution, a new result was added to the cache.
Suppose a user (User) deletes his account, and his posts are deleted with it (I don’t know if it’s possible to do this, but that’s not the point). Now we need to remove the results from the cache in which this user is present so that no one gets stale data. But how to do it right? Is there a functionality in Doctrine to automatically remove only those results from the cache that contain this user, which no longer exists?
I know that Doctrine has a QueryBuilder which is used in the methods above. This object builds a query to the database and returns a query object (Query), from which you can call the useResultCache () method and pass an ID to this method, by which you can then delete the result (s) of this query from the cache. I know that it is possible to pass the same ID to useResultCache() to different requests (Query objects) and delete all results of these requests from the cache by this ID.
I used an EventManager and created a listener whose method fires after EntityManager::flush() is processed, but the sad thing is that you need to write the deletion logic yourself, that is, look at which entity was saved, deleted or updated and, based on this information, remove certain results from the cache that contain obsolete data. I doubt that Doctrine itself can deal with this, but maybe there are some ways that make it easier?
And another very important detail: let's say there are 2 different results in the cache that were returned by the findByCity() method, in the first result returned by the findByCity( 'Moscow') method, there is a User that has actually been deleted , and in the second result, which was returned by the findByCity( 'Kyev') method, this User does not exist, because he is from Moscow. That is, the second result does not need to be deleted, it does not contain a non-existent User, which means that there is no obsolete data.
But in order to selectively remove only those results that definitely contain stale data, you need to check if the deleted User is in them. It's all too bloody, so it seems to me that it's better not to do such checks, but simply delete all results that could potentially contain outdated data. But even this seems to me too unpleasant a matter, which will take a lot of time and in which a lot of mistakes can be made.
If there is literature or an article where this is adequately written, then tell me one. I found this article https://www.gregfreeman.io/2012/invalidating-the-r... haven't tried it yet, because I still hope there are better ways.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
G
Grigory Vasilkov, 2019-10-04
@mcakaneo

That's right, cache invalidation is quite a "fun" thing to do.
In my opinion, grouping caches makes it a little easier to work with this - this is when the cache is stored not just in a file or radish with a key in the form of a hash, but when it is stored like this line "models.users.find.city", I also use the constant __METHOD__ magic - there is a namespace right away (although this is mainly suitable for future manual cleaning than for auto-deletion), it remains to add an admixture in the form of parameters for the user, by the way, this admixture can also, for example, base64 become later and serve as a key (user: 1 -> base64 ) and then find an entry where there is such and such a base64. At least you get the opportunity to find all the caches associated with the city and demolish them by hand.
Then a fun game begins, which is called - add an extra tag - the cache will be useless, if you don't add a tag - and every time you will clean what you don't need to clean. I think it can be handled as you said - listeners.
Again, there is a collective farm on the topic save the number of hundreds of the user for the 912th - nine. And demolish a hundred caches, for example. Or a thousand. Anything is better than 100%. And of course, think about the need to cache user data. It may be possible to cache only the unchanging part. Or do not cache at all what users are constantly changing.
For example, the category tree cache that is shown to everyone makes sense to cache and clear. And the user profile cache will lead to the fact that for a million users a million caches, and it’s better to let it load than burn out a gig of RAM for storing what is about to change
The more tags - the less efficient the cache and the more memory used for storing the same things, but at the same time, the easier it is to find and demolish it pointwise. On the other hand, the more listeners, even very simple ones, the more the system wants to turn into a message bus (damn it, what listener does what, and since the logs are later analyzed manually, hello "something broke") and force you to write a saga.
However, I think the general principle here is only that when deleting something, an additional deletion of the cache of something is launched. If you want to put it separately - do listeners and so on. But I would put it in the same place where the removal itself is done. In my opinion, if you know that the deletion is here, then it is more reasonable that the cache is immediately deleted, and not somewhere in the folder with lisners.
But surely someone knows how easier it is.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question