L
L
Lyova Matyushkin2013-06-08 11:18:49
Python
Lyova Matyushkin, 2013-06-08 11:18:49

Get html from web page loaded by javascript request

Faced with the following problem. There is a web page with a known url, from which html is taken. From this page, you can go to other pages following it (2, 3, 4...), the transition to which is carried out using the following function

function goto_page(pnum){
    var frm = document.forms["results"];
    frm.pagenum.value=pnum;
    frm.action="author_items.asp";
    frm.target="";
    frm.submit()
}

and accordingly leads to http://elibrary.ru/author_items.asp, the content of which changes when the selected page number changes.

I do the browsing myself using Qt:

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))
    self.app.exec_()  
  
  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  
 
url = 'http://elibrary.ru/author_items.asp?authorid=xxxxx'  
r = Render(url)


Question: how to use python to navigate to the desired page number and pick up html?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
K
KEKSOV, 2013-06-08
@LeoMat

1. Find the “results” form on the HTML page, understand what type it has - POST or GET
2. Find out what other fields it has besides pagenum. If there is nothing else, which is great, if there is, then you need to find out the values ​​\u200b\u200bthat are written in them (the application may have complex logic, say, one of the fields contains the session code, user code, etc.) In addition to the form, the application can send some data to the server through cookies, they also need to be tracked in the browser's network activity inspector (F12 button in Chrome, Network tab)
3. When all the request parameters are known, we make a similar request in Python and get the desired page.

V
Valentine, 2013-06-08
@vvpoloskin

Apparently, there is an HTML form on the page, upon confirmation of which an action occurs. And since you didn’t visually detect it, it’s most likely a POST method. We look with a firebug what parameters, in what way and where they are transferred, and do a POST or GET on python with the same parameters.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question