Answer the question
In order to leave comments, you need to log in
Decode data encoded with an individual algorithm
I am parsing a site written in ASP.NET ( www.iaai.com/Vehicles/VehicleAdvSearch.aspx?savepreference=true#anchorSearchResults ) and there, on every click, the JS function __doPostBack() almost works, which ajax sends the form, and in response receives something like this http://pastebin.com/c7eb9XhH After analyzing a little, it turned out to create a task that needs to be completed:
So let's assume that our answer was like this:
10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||
This line is written by me, but it uses the same principle as the one that is returned to me from ASP.NET. 10|updatePanel|h2.header|Привет мир|
This block consists of four subblocks separated by | $array = array(
'updatePanel' => array(
'h2.header' => 'Привет мир',
'p.search' => 'Элементов не найдено'
),
'deleteBox' => array(
'#results' => ''
)
);
Answer the question
In order to leave comments, you need to log in
<?php
$s = '10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||';
mb_internal_encoding('UTF-8');
function fetchBlock($str, $offset)
{
$p = mb_strpos($str, '|', $offset);
if (false === $p) {
return null;
}
$size = (int)mb_substr($str, $offset, $p-$offset);
$offset = $p+1;
$p = mb_strpos($str, '|', $offset);
$name = mb_substr($str, $offset, $p-$offset);
$offset = $p+1;
$p = mb_strpos($str, '|', $offset);
$key = mb_substr($str, $offset, $p-$offset);
$offset = $p+1;
$value = mb_substr($str, $offset, $size);
$offset = $offset+mb_strlen($value);
return array('offset' => $offset+1, 'block' => array($name => array($key => $value)));
}
$result = array();
$block = array('offset' => 0);
while ($block = fetchBlock($s, $block['offset'])) {
$result = array_merge_recursive($result, $block['block']);
}
print_r($result);
In such a situation, it is easier to use server-side browser engines like PhantomJS. At the output, we get the generated html and it doesn’t matter what and how it is encoded. Everything that is available in the browser is available to such a parser.
MB wget, wget+python, etc?
As for php, if I understood the task correctly, I don't see any difficulties, hmm.
If the answer consists strictly of several parts of 4 blocks for example.
RegExp s, and go. Or explode, and all sorts of substr.
And getting curl data + all the same regexp to search for a js function call
Wouldn't this option work? Hmm.
$text = "10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||";
$array = array();
while(mb_strlen($text)>0) {
$data = explode('|', $text);
$str = "{$data[0]}|{$data[1]}|{$data[2]}|";
$data[3] = substr($text, (strpos($text, $str) + mb_strlen($str)), $data[0]);
$array[$data[1]][$data[2]] = $data[3];
$text = str_replace($str.$data[3], '', $text, 1)
}
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question