S
S
Semisonic2012-02-18 14:17:22
PHP
Semisonic, 2012-02-18 14:17:22

Receiving and parsing HTML, sending JSON requests, Tor anonymization - help me decide on the tools

Hello!

As a result of circumstances far from the IT world and natural curiosity, I have the need to solve one seemingly simple, but specific task. In short, you need to periodically request a page from a remote server, analyze the received HTML (tear out several links lying in a known place, isolate pieces from these links lying in a known place), and then, based on the results of the analysis and some internal logic, send it back through the local Tor proxy JSON request of a known form.

I don't have web development skills in general, but I have programming experience and a desire to learn. Therefore, I will be grateful to already realized web developers and simply knowledgeable comrades for ideas and thoughts about the tools with which this task could be solved. The solution does not have to be universal and beautiful; crutches will also work, with the help of which you can relatively quickly make a working version.

Personally, I came up with two different directions of solution. The first is a JS script that I could run from within Firefox. Parsing HTML, sending JSON requests would then be done by a script, and working with Tor could be implemented by configuring FF accordingly. But, as far as I understand, pure JS cannot get the code of a remote page.
The second is writing some script in PHP or Python that would do all the work. Googling showed that in principle the problem is solvable. But I can't decide what to use; plus it's not clear how to use Tor in this case.

In general, if someone once did something similar, share your experience, and I'll try to figure it out myself =).

Thank you!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
W
WEBIVAN, 2012-02-18
@WEBIVAN

in php

$ch = curl_init();
//Получаем нужную страницу в переменную $data
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data=curl_exec($ch);
curl_close($ch);
<!-- тут пишем внутреннюю логику, json ложим, например в переменную $json -->
$ch = curl_init(); 
//Куда слать
curl_setopt($ch, CURLOPT_URL, "http://example.net"); 
//IP и порт тор прокси
curl_setopt($ch, CURLOPT_PROXY, "127.0.0.1:8080"); 
//Логин пароль прокси, если есть
curl_setopt($curl, CURLOPT_PROXYUSERPWD,' username:pass'); 
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); 
curl_setopt($ch,CURLOPT_POST,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,$json);
curl_exec($ch); 
curl_close($ch);

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question