A
A
Asya2019-11-12 00:21:10
Python
Asya, 2019-11-12 00:21:10

How to parse data from a site with infinite scroll?

how to get data from table cells from this site ?
tried like this:

from bs4 import BeautifulSoup
import requests
import json


page = requests.get('http://www.trafficengland.com/traffic-alerts', json={"key": "value"})
print(page)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)
print(soup.find('tr'))

output:
<!DOCTYPE HTML SYSTEM "about:legacy-compat">

<html version="2.0"><head><meta charset="utf-8"/><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><meta content="Traffic England" name="description"/><meta content="Traffic England, Highways England" name="keywords"/><link href="/resources/images/favicon/apple-touch-icon-57x57.png?v=1556431329287" rel="apple-touch-icon" sizes="57x57"/><link href="/resources/images/favicon/apple-touch-icon-60x60.png?v=1556431329287" rel="apple-touch-icon" sizes="60x60"/><link href="/resources/images/favicon/apple-touch-icon-72x72.png?v=1556431329287" rel="apple-touch-icon" sizes="72x72"/><link href="/resources/images/favicon/apple-touch-icon-76x76.png?v=1556431329287" rel="apple-touch-icon" sizes="76x76"/><link href="/resources/images/favicon/favicon-32x32.png?v=1556431329287" rel="icon" sizes="32x32" type="image/png"/><link href="/resources/images/favicon/favicon-96x96.png?v=1556431329287" rel="icon" sizes="96x96" type="image/png"/><link href="/resources/images/favicon/favicon-16x16.png?v=1556431329287" rel="icon" sizes="16x16" type="image/png"/><link href="/resources/images/favicon/manifest.json?v=1556431329287" rel="manifest"/><link color="#5bbad5" href="/resources/images/favicon/safari-pinned-tab.svg?v=1556431329287" rel="mask-icon"/><link href="/resources/images/favicon/favicon.ico?v=1556431329287" rel="shortcut icon"/><meta content="#2b5797" name="msapplication-TileColor"/><meta content="/resources/images/favicon/browserconfig.xml" name="msapplication-config"/><meta content="#ffffff" name="theme-color"/><link href="/resources/css/compiled/style.css?v=1556431329287" rel="stylesheet"/><link href="/resources/lib/openlayers3/ol.css?v=1556431329287" rel="stylesheet" type="text/css"/><link href="/resources/lib/openlayers3/ol3-popup.css?v=1556431329287" rel="stylesheet" type="text/css"/><link href="/resources/lib/jQuery/jquery-ui.css?v=1556431329287" rel="stylesheet" type="text/css"/><title>Welcome to Traffic England</title></head><body><div version="2.0"><script>
        var isUserAuthenticated = false;
        var isManager = false;
        var isDataManager = false;
        var isSubscriberManager = false;
        var isPro = false;
        var isSubscriber = false;
        var rssToken = "";
        </script><div class="container header"><div class="row"><div class="logo-te"><a href="/"><img alt="Traffic England a service from Highways England" height="67" src="/resources/images/traffic-england-logo.png" width="546"/></a></div><div id="authentication-options"></div></div></div></div><div version="2.0"><div class="navbar"><div class="navbar-inner"><ul class="nav"><li class=""><a href="/">Map</a></li><li class=""><a href="/traffic-report">Report</a></li><li class="active"><a href="/traffic-alerts">Alerts</a></li><li class=""><a href="/faq">FAQs</a></li><li class=""><a href="/help">Help</a></li></ul></div></div></div><div version="2.0"><div class="container"><div class="row"><div class="span12 ta-top-menu-placeholder"></div></div><div class="row"><div class="span12 ta-display-placeholder"></div></div></div><div class="ajax-loader" id="loadmoreajaxloader" style="display: none;"><div class="ajax-loader-header" id="header"><center><span>Please wait</span></center></div><div class="ajax-loader-spinner" id="spinner"><center><img src="/resources/images/common/ajaxloader.gif"/><p>Loading...</p></center></div></div></div><div id="footer" version="2.0"><div class="container footer"><div class="row"><div class="span6"><a href="/help">Help</a> |
        <a href="/faq">FAQ</a> |
        <a href="/cookies">Cookies</a> |
        <a href="/privacy-policy#Disclaimer">Disclaimer</a> |
        <a href="/accessibility">Accessibility</a> |
        <a href="/privacy-policy">Privacy Policy</a> |
        <a href="/subscribers">Subscribers</a></div></div></div><script type="text/javascript">
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create','UA-46903933-2', 'auto');
    ga('send', 'pageview');
  </script></div></body><script data-main="/resources/js/app.js" src="/resources/lib/requirejs/require.js?v=1556431329287"></script><script>
      require.config({
          urlArgs: 'v=1556431329287'
      });
    </script></html>

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
Igor, 2019-11-12
@asyaevloeva

Nowhere else...

http://www.trafficengland.com/api/events/getAlerts?start=0&step=100&order=Severity&is_current=1&events=CONGESTION,FULL_CLOSURES,ROADWORKS,INCIDENT,WEATHER,MAJOR_ORGANISED_EVENTS,ABNORMAL_LOADS&unconfirmed=false&completed=false&includeUnconfirmedRoadworks=true&_=1573508267254
http://www.trafficengland.com/api/events/getAlerts?start=100&step=100&order=Severity&is_current=1&events=CONGESTION,FULL_CLOSURES,ROADWORKS,INCIDENT,WEATHER,MAJOR_ORGANISED_EVENTS,ABNORMAL_LOADS&unconfirmed=false&completed=false&includeUnconfirmedRoadworks=true&_=1573508267255

Pay attention to
start=0&step=100
start=100&step=100
start=200&step=100
...
There is nothing to twist.
They put everything on a plate for you, just take it.

D
dollar, 2019-11-12
@dollar

Reverse engineer this site and then inject it into the infinite loading feature.
This is quite a difficult task. So a complete answer will work.

S
Sergey Yavin, 2019-11-12
@sjaserds

On the site, open the inspector Shift + F12, go to the Network tab, select XHR, here you will see all the APIs for exchanging information between the server and the client. You need to find a request to which the server will answer you with the information you need.
Example:

http://www.trafficengland.com/api/events/getAlerts?start=0&step=100&order=Severity&is_current=1&events=CONGESTION,INCIDENT&unconfirmed=false&completed=false&includeUnconfirmedRoadworks=true&_=1573554890656
You can follow this link and see what they answered you there.
Further in the code, you work with this data.
Example:
import requests
from fake_useragent import UserAgent

def request_json():
  response = requests.get("http://www.trafficengland.com/api/events/getAlerts?start=0&step=100&order=Severity&is_current=1&events=CONGESTION,INCIDENT&unconfirmed=false&completed=false&includeUnconfirmedRoadworks=true&_=1573554890656", timeout = 5, headers = {'User-Agent': UserAgent().chrome})
  fight_all = response.json()
  return fight_all

def test_met(responce):
  print(responce[0]["gdp"])

test_met(request_json())

Result:
5dca8d482784e897093705.png

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question