K
K
KlassT2016-11-16 21:34:39
Python
KlassT, 2016-11-16 21:34:39

Why does the site block IP when parsing?

Need to scrap the site

def get_html(url):
    request = urllib.request.Request(url)
    request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1.3) Gecko/20190824 Firefox/3.5.3')

    cookies = [
        ['cookie', '__unam=4145b67-1586e230cee-29d69ac3-2'],
        ['cookie', '__utma=27978091.2013216964.1479228851.1479316541.1479316541.'],
        ['cookie', '__utmb=27978091.1.10.1479316541'],
        ['cookie', '__utmc=27978091'],
        ['cookie', '__utmt=1'],
        ['cookie', '__utmz=27978091.1479316541.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)'],
        ['cookie', '_ga=GA1.2.2013216964.1479228851'],
        ['cookie', '_gat=1'],
        ['cookie', 'PHPSESSID=3r9u07ddc62stqf6dph3592pq7']
    ]
    for cookie in cookies:
        request.add_header(cookie[0], cookie[1])

    opener = urllib.request.build_opener(urllib.request.ProxyHandler({'https': '196.145.458.269:8000'}))
    urllib.request.install_opener(opener)
    res = urllib.request.urlopen(request, timeout=600)
    return res.read()

def get_soup(html):
    soup = BeautifulSoup(html)
    return soup

def get_states():
    soup = get_soup(get_html('http://freeemailtrace.com'))
    # тут работа с данными

def main():
    get_states()

if __name__ == '__main__':
    main()

get_states() already works with data. What else can block the site?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question