Is it possible with Node.js/php to get the hidden content of a specific page of a specific site?

B

Bowen2019-02-08 00:12:08

PHP

Bowen, 2019-02-08 00:12:08

Hello.
There is a regular forum where authorization is required to view the topic.
There are hidden fields on the authorization page.

<input type="hidden" name="formType" value="mainLoginForm">
<input type="hidden" name="formOid" value="517260832138604146">
<input type="hidden" name="formOidMd5" value="0C2A68950BF4FE825888F2A32BAAB69E">
<input type="hidden" name="redirect" value="https://website.com">
<input type="hidden" name="showLoginForm" value="false">

The value in the formOid and formOidMd5 fields are generated on every request to this page.
I spent two days trying to do this, but in the end I only got confused about it all. As I understand it, you need to make two requests. One for getting the values above the specified fields, and the second for authorization. How can I do that ?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

N

nowm, 2019-02-08
@Bowen

I usually use the DOM in PHP. Of course, you can catch individual fields with regular expressions if you know their name, but there is a high probability that one input will first have `type`, then `name`, and the other will have to write a separate code for each input, which will extract a value from it. It is easier to perceive them as entities with the same characteristics and shift the parsing to specialized libraries.
As far as I understood, judging by the peculiarities of the names of the inputs, it was a forum based on hoop.la (not to be confused with hoopla). I would organize the parsing like this:

$dom = new \DOMDocument();

libxml_use_internal_errors(true);
if ([email protected]$dom->loadHTML('содержимое страницы в виде HTML')) {
    /** @var \LibXMLError $error */
    $error = libxml_get_last_error();
    if ($error->level > LIBXML_ERR_ERROR) {
        throw new \Exception($error->message);
    }
}

$xpath = new \DOMXPath($dom);

/** @var \DOMNodeList $form */
$form = $xpath->query('//form[@name="mainLoginForm"]');
if (!$form->length) {
    throw new \Exception('Форма не найдена');
}

$post_data = [];

/** @var \DOMElement $input */
foreach ($xpath->query('.//input', $form->item(0)) as $input) {
    $post_data[$input->getAttribute('name')] = $input->getAttribute('value');
}

$post_data['email'] = 'логин';
$post_data['password'] = 'пароль';

// В $post_data находятся все данные которые нужно отправлять

The essence of this approach is that there is no particular need to drive into what fields there are and what they are called. We bypass all available inputs in a specific form and put them into an array, then we patch this array with our login and password on top - that's it, we have all the necessary data, including various tokens.
In some other situations, in addition to inputs, you will also need to look for selects, or provide for situations when you need to make a choice from several radio elements, or provide for even more complex situations, but specifically in the case of hoopla this is not required - there are only inputs , and they do not seem to be modified dynamically using JS (I haven’t tested much).
The code may not work. I half copied and pasted it from my own developments, half wrote it directly on the Toaster. Most importantly, it makes sense.

A

Alexander, 2019-02-08
@AleksandrB

$str = '<input type="hidden" name="formOidMd5" value="0C2A68950BF4FE825888F2A32BAAB69E"> <input type="hidden" name="formOid" value="517260832138604146">';
preg_match_all('#name="formOidMd5" value="(.+?)">#is', $str, $arr);
preg_match_all('#name="formOid" value="(.+?)">#is', $str, $arr2);
print_r($arr[0][1]);
print_r($arr2[0][1]);

To get it all, use
$arr = file_get_contents("http://example.com/");