D
D
DarkByte20152016-12-09 15:59:16
Python
DarkByte2015, 2016-12-09 15:59:16

Why is the content not getting through?

There is one thought. I want to grab a lot of music from the zaycev.net service. I started writing a parser in python and got stuck in one place ... In short, they have integration with mail.ru there. More precisely search from mail.ru. Well, for example, I send such a request " go.mail.ru/zaycev?q=Hilary+Duff ", and in response I receive incomplete content. Those. literally no 300+ lines of code. I just tried to sniff this request with fiddler. Everything is fine in it. Then, just in case, I took all the request headers from fiddler and made a request with them using postman - the same garbage as in python - there is no part of the content. How can this be? There was an idea that this content might be generated by javascript, but the fiddler received all the content, and he would not, I think, execute any scripts. So it disappears. Some sort of magic...
P.S. The error is not exactly in python, because Postman also receives less content.
However, I don’t feel sorry for the code, then 5 kopecks:

import sys
from pyquery import PyQuery as pq
import urllib
import json

SEARCH_URL = 'http://go.mail.ru/zaycev'

def get_search_results(query):
    q = urllib.urlencode({'q': query})
    url = '%s?%s' % (SEARCH_URL, q)
    d = pq(url = url)
    print([ a.text() for a in d('.page-navig a').items()])

    for e in d('li.result__li').items():
        a = e('.result__title a')
        yield {
            'track': a.text(),
            'link': a.attr('href')
        }

def main(args):
    r = list(get_search_results('Hillary Duff'))

    with open('output.json', 'wb') as o:
        json.dump(r, o, sort_keys = True, indent = 4)

if __name__ == '__main__':
    main(sys.argv)

When I download, the page-navig block is missing (the forward / back buttons, I need to determine if there are more pages), as well as part of the results (result__li blocks), for example, in the request that I tested (in the code) there are 6 of them, and in the browser there are much more (plus there's more than one page of them).

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
eudj1n, 2016-12-09
@DarkByte2015

def main(args):
    r = list(get_search_results('Hilary Duff'))

    with open('output.json', 'wb') as o:
        json.dump(r, o, sort_keys = True, indent = 4)

A typo in the name of the artist in the code itself. Compare the output: go.mail.ru/zaycev?q=Hillary+Duff and go.mail.ru/zaycev?q=Hilary+Duff

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question