L
L
La2ha2013-03-09 23:58:52
PHP
La2ha, 2013-03-09 23:58:52

Decode data encoded with an individual algorithm

I am parsing a site written in ASP.NET ( www.iaai.com/Vehicles/VehicleAdvSearch.aspx?savepreference=true#anchorSearchResults ) and there, on every click, the JS function __doPostBack() almost works, which ajax sends the form, and in response receives something like this http://pastebin.com/c7eb9XhH After analyzing a little, it turned out to create a task that needs to be completed:
So let's assume that our answer was like this:

10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||
This line is written by me, but it uses the same principle as the one that is returned to me from ASP.NET.
And so the principle of action here is this:
This line consists of three blocks that have the same construction principle, which I want to eventually save into an associative array. And so let's take the first block and parse it:
10|updatePanel|h2.header|Привет мир|
This block consists of four subblocks separated by |
1) 10 is the number of characters in the fourth sub-block, in order to avoid escaping it probably
2) updatePanel suppose that this is the action we need to perform
3) h2.header is the selector on which the action is performed
4) Hello world is a string of 10 characters (which we immediately indicated)
And so the task is to present it all as an array like this:

$array = array(
    'updatePanel' => array(
        'h2.header' => 'Привет мир',
        'p.search'  => 'Элементов не найдено'
    ),
    'deleteBox'   => array(
        '#results' => ''
    )
);

But this array should not be created manually, but by PHP itself, for some reason I can’t figure out where to even start writing a function.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
A
Alexey Akulovich, 2013-03-10
@La2ha

Well, if you are just interested in a PHP solution, then you can start with this
<?php
$s = '10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||';

mb_internal_encoding('UTF-8');

function fetchBlock($str, $offset)
{
    $p = mb_strpos($str, '|', $offset);
    if (false === $p) {
        return null;
    }
    $size = (int)mb_substr($str, $offset, $p-$offset);
    $offset = $p+1;

    $p = mb_strpos($str, '|', $offset);
    $name = mb_substr($str, $offset, $p-$offset);
    $offset = $p+1;

    $p = mb_strpos($str, '|', $offset);
    $key = mb_substr($str, $offset, $p-$offset);
    $offset = $p+1;

    $value = mb_substr($str, $offset, $size);
    $offset = $offset+mb_strlen($value);

    return array('offset' => $offset+1, 'block' => array($name => array($key => $value)));
}

$result = array();

$block = array('offset' => 0);
while ($block = fetchBlock($s, $block['offset'])) {
    $result = array_merge_recursive($result, $block['block']);
}

print_r($result);

On regular expressions, you can make it more compact, but for large texts (not like here in one line) it will be faster.

Returns exactly what is specified in the question.

A
Alexey Sundukov, 2013-03-10
@alekciy

In such a situation, it is easier to use server-side browser engines like PhantomJS. At the output, we get the generated html and it doesn’t matter what and how it is encoded. Everything that is available in the browser is available to such a parser.

N
Nicholas, 2013-03-10
Sumrak @NikolasSumrak

MB wget, wget+python, etc?
As for php, if I understood the task correctly, I don't see any difficulties, hmm.
If the answer consists strictly of several parts of 4 blocks for example.
RegExp s, and go. Or explode, and all sorts of substr.

N
Nicholas, 2013-03-10
Sumrak @NikolasSumrak

And getting curl data + all the same regexp to search for a js function call

N
Nicholas, 2013-03-10
Sumrak @NikolasSumrak

Wouldn't this option work? Hmm.

$text = "10|updatePanel|h2.header|Привет мир|20|updatePanel|p.search|Элементов не найдено|0|deleteBox|#results||";
$array = array();
while(mb_strlen($text)>0) {
  $data = explode('|', $text);
  $str = "{$data[0]}|{$data[1]}|{$data[2]}|";
  $data[3] = substr($text, (strpos($text, $str) + mb_strlen($str)), $data[0]);
  $array[$data[1]][$data[2]] = $data[3];
  $text = str_replace($str.$data[3], '', $text, 1)
}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question