Answer the question
In order to leave comments, you need to log in
How to get data?
import lxml.html as html
import requests
import time
url = "http://www.world-art.ru/animation/manga.php?id="
folder = ""
counter = 501
info = {}
page = html.parse(url+str(counter)).getroot()
info["name"] = page.xpath("html/body/table/tbody/tr[1]/td/center/table[7]/tbody/tr/td/table/tbody/tr/td[5]/table[2]/tbody/tr/td[3]/font[1]/b")[0].text
info["year"] = page.xpath("html/body/table/tbody/tr[1]/td/center/table[7]/tbody/tr/td/table/tbody/tr/td[5]/table[2]/tbody/tr/td[3]/b/font[1]")[0].text
info["name1"] = page.xpath("html/body/table/tbody/tr[1]/td/center/table[7]/tbody/tr/td/table/tbody/tr/td[5]/table[2]/tbody/tr/td[3]")[0].text
print(info["name1"])
Actually, none of the elements is missing. The path is correct, because used FirePath. I don't know how else to get it. import lxml.html as html
import requests
import time
from lxml import etree
from lxml.html import HTMLParser
# url = "http://animanga.ru/default.aspx?a=book&id="
url = "http://www.world-art.ru/animation/manga.php?id="
folder = ""
counter = 520
info = {}
r = requests.get(url+str(counter))
if r.ok:
page = etree.fromstring(r.text, parser=HTMLParser())
name = page.xpath("//font/b")
for element in name:
if (element.text and element.text.find("манга")!=-1):
string = element.text
string = string[:string.find("(")-1]
print(string)
name_eng = page.xpath("//tr/td/text()")
i = 1
for element in name_eng:
if (i==40):
print(element)
i += 1
year = page.xpath("//font")
for element in year:
string = element.text
if (string and string.isnumeric()):
print(string)
I know that the code is terrible and that it does not receive name_eng, but at least something.
Answer the question
In order to leave comments, you need to log in
download the file to a local one and carefully read the article and video, it will change your
approach
prostoitblog.ru/xpath-i-css/kak-sostavlyat-xpath-i
... /td/center/ table[7] /tbody/tr/td/table/tbody/tr/ td[5]
is a very bad sign - one change per page/ in the table, and all the numbering goes through the forest.
Learn to compose expressions like regexps: build a path that matches on your site
Use Beautiful Soup, it's very convenient to parse with it
Here's a guide, even if you don't know English, everything is clear
https://www.youtube.com/watch?v=3xQTJi2tqgk
Never had to parse such sites, I do not envy you.
As you have already been advised, open the source of the page, there is a slightly different structure, for example, there is no tbody.
Also, the root element is html, so it doesn't need to be specified in the xpath.
I got the same shitty code:
import lxml.html as html
import requests
from lxml import etree
from lxml.html import HTMLParser
info = {}
r = requests.get("http://www.world-art.ru/animation/manga.php?id=501")
if r.ok:
tree = etree.fromstring(r.text, parser=HTMLParser())
info["name"] = tree.xpath("body/table/tr[1]/td/center/table[7]/tr/td/table/tr/td[5]/table[2]/tr/td[3]/font[1]/b")[0].text
info["year"] = tree.xpath("body/table/tr[1]/td/center/table[7]/tr/td/table/tr/td[5]/table[2]/tr/td[3]/font[2]")[0].text
info["name1"] = str(etree.tostring(tree.xpath("body/table/tr[1]/td/center/table[7]/tr/td/table/tr/td[5]/table[2]/tr/td[3]")[0])).split('<br/>')[1]
print(info)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question