What are the errors in the parser?

H

hardwellZero2015-01-28 23:32:42

Python

hardwellZero, 2015-01-28 23:32:42

Good evening.
Started learning Python. I decided to practice a little and write a simple parser, and since I like to watch TV shows from Lostfilm, the choice fell on their site.
In general, the script works, but sends a message every time it passes the check. What kind of check should be done in style:
If already sent - do not send
Please forgive me for such a "redundant" code. I'm just getting started ;)

# -*- coding: utf-8 -*-
from grab import Grab
import time
import smtplib
import email.utils
from email.mime.text import MIMEText

from_addr = 'имяотправителя@gmail.com'
to_addrs  = 'имяполучателя@yandex.ru'

text = 'Вышла новая серия! Чекай!'

msg = MIMEText(text, "", "utf-8")

msg['To'] = email.utils.formataddr(('Эй ты', to_addrs))
msg['From'] = email.utils.formataddr(('Свежие сериалы', from_addr))
msg['Subject'] = 'Свежак'

username = 'логин'
pwd = 'пароль'

server = smtplib.SMTP('smtp.gmail.com:587')
server.starttls()
server.login(username, pwd)

url = Grab()
url.go('http://www.lostfilm.tv/browse.php')

old_list_serials = [u'\u0412\u043e\u043d\u043d\u0430\u044f \u043b\u043e\u0449\u0438\u043d\u0430', u'\u041f\u0435\u0440\u0432\u043e\u0440\u043e\u0434\u043d\u044b\u0435', u'\u041a\u043e\u0432\u0430\u0440\u043d\u044b\u0435 \u0433\u043e\u0440\u043d\u0438\u0447\u043d\u044b\u0435', u'\u0427\u0435\u0440\u043d\u044b\u0435 \u043f\u0430\u0440\u0443\u0441\u0430', u'\u0411\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430\u0440\u0438']
new_list_serials = []

url_select = url.doc.select('//span[@style="font-family:arial;font-size:14px;color:#000000"]')[:5]
check = 0
while check == 0:
    for serials in url_select:
        new_list_serials.append(serials.text())

    if new_list_serials == old_list_serials:
        print "EQUAL"
    elif new_list_serials != old_list_serials:
        server.sendmail(from_addr, to_addrs, msg.as_string())
        print "NOT EQUAL"
        del old_list_serials[:]
        for serials in url_select:
            old_list_serials.append(serials.text())
        print old_list_serials
    time.sleep(10)
    check = 0

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vitaly Belikov, 2015-01-29
@hardwellZero

new_list_serials does not reset, but gets larger with each iteration.
immediately after
add
And more:

del old_list_serials[:]
for serials in url_select:
    old_list_serials.append(serials.text())

can be replaced by
old_list_serials = new_list_serials

I

Ilya, 2015-01-29
@FireGM

You always have new_list_serials != old_list_serials. Because in the new list, links from the page are taken from you, and in the old list they are simply added.
Make a better entry to the file, and then for each element check whether it is in the file, if not, then add it to the file. At the end of all checks, you can send a letter.
And it is better to run in some kind of crontab.
upd:
I look in the book, I see a fig ... I beg your pardon, I didn’t see it in the code at all.