A
A
Alexander2020-08-31 14:28:49
Parsing
Alexander, 2020-08-31 14:28:49

What can be used to scrape data from the personal account on the site?

I needed to pull data from the personal account on the copart.com website into my system.
I tried to solve the problem using Curl, but it doesn’t work, they have protection against bots there, and even if you manage to bypass it once, the second time it may require a captcha - in short, for production very unstable I

started researching the issue and realized that I need to use special tools like Selenium, PhantomJS, SlimerJS ......
I liked the slimerjs option - but it does not work with new browsers (it seems to be outdated)
Selenium - it seemed redundant functionality and resources

Please tell me what tools could be used to solve this problem?
Considering the fact that I am a complete zero in this topic, it would be desirable with a not very entry threshold (I work with PHP, JS)

There is so much information on the network that I just get lost, I would like to narrow the search area

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
d-sem, 2020-08-31
@alexmixaylov

puppeteer, and preferably the original, not an adaptation for php.
https://github.com/puppeteer/puppeteer
provides an almost full-fledged API to a full-fledged browser chrome (by default, chromium, but with one line of the config it changes to a full-fledged browser).
with captcha is more complicated and much depends on the captcha itself.
although in this scenario there should be no problems, especially if you save cookies

I will need to go to the resource in my personal account using the crown and pull up the response json
2-3 times a day.
This will create less load if I go from the browser and copy, so from an ethical point of view - everything is OK

F
FanatPHP, 2020-08-31
@FanatPHP

If you do not have permission from the site to parse it, then "for production" no solution will work at all. They will still be found and removed.
Not to mention the fact that using "for production" the data of someone else's site is disgusting. All you will achieve in the end is to crap the experience to other users, because the site will load nuts even more

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question