Answer the question
In order to leave comments, you need to log in
On what to write a site parser? in PHP or Ruby?
On what to write a site parser?
I know PHP well , I
don't know Ruby at all :)
The parser should be multi-threaded and work quickly)
so I think for this you need to learn ruby
​​or is PHP enough?
Answer the question
In order to leave comments, you need to log in
Normal people are not looking for adventures, they take scrapy and python and get a multi-threaded parser out of the box.
On what you know best. You know php - write on it. If you want to get used to Ruby along the way, write to Ruby. The possibilities of languages ​​for this purpose are practically the same.
In general, php, especially multi-threaded, will work for a very long time.
I would write it generally under the desktop on something, and not on the puff.
But just in case, I'll throw in a link that makes life much easier
simplehtmldom.sourceforge.net
In my opinion, there is not much difference on what to write on, so I advise you to write on what is more to the soul.
For multi-threaded requests in php, you can use the curl library, and the curl_multi_exec function. I think in ruby ​​it is possible to use it.
I wrote a parser for a long time as follows. bash + curl , parsed it with the same bash and got the necessary piece. Then I passed it through the console to the php script. Worked very quickly and on large volumes.
Now I would love to try something ready.
Java + jsoup. If the site is formed through JS, then Selenium instead of jsoup.
Multithreading in Java is easy to do. After PHP, Java is easy to write.
Thread t = new Thread(new Runnable() {
@Override
public void run() {
parse();
}
});
t.start();
Good afternoon, parsers can be written in literally all languages, but there are separate languages ​​for parsers that are suitable for this! Of these YAPs I will advise: '1. Python 2. PHP, 3. Javascript , 4. Ruby, 5. Java and .Net'
You can choose any of these options!
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question