How to parse a site using the grab python library if the site returns html with javascript?

D

Dmitry Vyatkin2016-08-09 00:56:51

Python

Dmitry Vyatkin, 2016-08-09 00:56:51

I know that selenium can be used in conjunction with phantomjs for such purposes, but how can this be done through the python grab library?
Here is what I get when I request via grab

<!DOCTYPE html>
<!-- The line above switches the browser into standard mode, see also
http://en.wikipedia.org/wiki/Internet_Explorer_box_model_bug
http://en.wikipedia.org/wiki/Document_type_declaration

This line must be the first in the data stream that is sent to the browser. Add
onload="alert('mode: ' + document.compatMode);"
to the body tag to see which mode the browser has chosen. If the output is "BackCompat" then
some other part of the server system might have added code in front of this file rendering
the DOCTYPE setting useless. Check with "View page source" or such in the browser.
-->
<html>
  <head>
    <meta Http-Equiv="X-UA-Compatible" Content="IE=Edge">
    <title>NOP Network Operations Portal</title>
    <meta Http-Equiv="Cache-Control" Content="no-cache">
    <meta Http-Equiv="Pragma" Content="no-cache">
    <meta Http-Equiv="Expires" Content="0">
  </head>
  <!-- 964px is the width of the header image,
         margin: 0 auto; centers on Firefox,
         text-align: center; centers on IEx7 -->
  <body id="main" style="margin: 0 auto; width: 964px; background: #C5CDD7; text-align: center;">
    <script language='javascript' src='/PORTAL/gateway/spec/PORTAL.20.0.0.4.51/gwt/MainPages/MainPages.nocache.js'></script>
    <iframe src="javascript:''" id="__gwt_historyFrame" style="position:absolute;width:0;height:0;border:0"></iframe>
  </body>
</html>

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

asd111, 2016-08-09
@asd111

Chrome F12 Network and see what requests go where. Then you repeat the same requests using the grab library and get the answers you need.

S

shamanovski, 2016-08-09
@shamanovski

Use the dryscrape module (only works on Unix systems)