Answer the question
In order to leave comments, you need to log in
Protecting json data transmitted from the server?
In modern sites, when more and more rendering goes to the browser side, the server-side acts only as a data transfer (in json format for example). This turns the server-side into a kind of API (request -> output of structured data)
Faced the problem that in this way a lot of data can be simply pulled from the site without any parsing of html pages.
If on sites with authorization it is easier to get around such things, then what about sites where there is no authorization?
For example : a bulletin board site: it will be easy and convenient for competitors to pull out all the information and put it on their site.
Is there any mechanism to determine that the data is being given to a freshly loaded page from my site and not to a third party data grabber?
Answer the question
In order to leave comments, you need to log in
You need to understand that if data is being provided publicly, then all these tricks are an attempt to keep water in a sieve. Competitors need to be crushed by a higher level of service on their resource, and not stick spokes in the wheels of visitors with the danger of getting into a legitimate user who will spit and go to a competitor.
As already mentioned above - for every protection there is an opportunity to bypass it. But here are a few options:
1) The simplest is a limit on the number of requests from one IP
2) When loading an HTML page into a session, write some timer value, and when requesting json, check how long the user has opened the page.
And perhaps the most important point
When detecting requests from a bot - block not all requests, but only random ones (or rather, give the left data) - then it will be more difficult to track which changes in the bot algorithm lead to an error.
1) And how can you protect data on a site without authorization from being pulled by an html parser, that is, how would you protect your example if it did not use ajax?
2) It is possible to generate a certain token for each page, which will be transmitted along with the request to the api. But what will prevent the bot from loading the page first, parsing this token from it and using it to request the api? Thus, we return to point (1) - how would you protect the site without user authorization if it did not use ajax?
You look like a fanatic who doesn't know much about programming, but considers his idea super-genius and immediately takes care of protecting his data. It makes no sense to bother with this, you were given excellent advice - come up with something so that you are not afraid of competitors.
If you want to protect yourself from grabbers, then you should come up with rules in your application, according to which it will be able to distinguish people from robots and ban the latter. The rules can be based on the analysis of the order of actions, the frequency of requests (for example, a person usually reads information), user agent, referrer (or rather, its presence or absence), etc.
Believe me, I didn’t, I was engaged in stealing data, and I’ll tell you for sure: for every protection of such data there is a new loophole to get it out, forget it, it’s impossible, the “villain” can fake any pledges, given the protocol, you won’t be able to check their authenticity
There is no difference between HTML and json, to fasten something like simplehtmldom.sourceforge.net/manual.htm it will take 2 extra minutes. If you really want to do security through obscurity, remake Jason's syntax into something like <!!Key!!>Value<*><!!Key2!!>Value2<*> and change it every day, but it's better to find something more useful occupation for yourself.
I agree with those who say that it is not necessary to protect data, but to make their grabbing unnecessary.
But still one thought flies in my head. The point is the following.
Make some module, for example, on a flash (or something else that you can’t just open later and see what’s inside), which will make requests to the server and call the specified js function when a response is received. The server, upon request, will generate somehow encrypted content, and the module will decrypt it and pass the result to the callback.
That is, if jquery was used for ajax requests before, something like this
$.get('/get.php', {module: 'module'}, function(data){
console.log(data)
}, 'json') ;
Now it can be replaced by calling the wrapper method for this flash module without changing the logic of the entire application.
Unfortunately, I'm not familiar with flash at all, so I'm not sure that such a scheme can be implemented. Also, the flash solution will not work where flash is not installed.
The best defense against grabbing is to detect bots, for example, through logs, and ban / redirect to a captcha page / write abuse if the server is in Germany / send curses / slip crooked data.
You can use javascript to decrypt the data (phi).
You can display data in text form, such as the time in the form of yesterday, 4 hours ago, this can complicate life a little.
Another working option is to make paid viewing of data / parts of data.
In general, you are unlikely to defend yourself. I'm sure I would have plundered your site for once if I suddenly started garbbing.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question