Answer the question
In order to leave comments, you need to log in
How to read newline using re library?
Good evening!
I am writing a simple parser to display some fields from the page of an online store. For example, like this:
rx_image = r'class="jshop_img (.*)" src="(.*)" alt='
image = re.compile(rx_image)
for line in page:
img_obj = image.search(line)
if img_obj:
img_item = img_obj.group(2)
print "Picture: ", img_item
<img class="jshop_img second-image" src=/components/com_jshopping/files/img_products/thumb_________________________1_.jpg" alt="">
<div class="name">
<a href="/component/jshopping/product/view/97/334?Itemid=101">Коктейль молочный малый</a>
</div>
rx_name = r'<a href="(.*)">(.*)</a>'
image = re.compile(rx_image, re.DOTALL)
Answer the question
In order to leave comments, you need to log in
In general, I decided not to bother and heaped crutches.
rx_name_f = r'<div class="name">'
rx_name = r'<a href=.*>(.*)</a>'
name = re.compile(rx_name)
name_f = re.compile(rx_name_f)
i = False
for line in page:
name_obj = name.search(line)
namef_obj = name_f.search(line)
if i and name_obj:
name_item = name_obj.group(1)
print "Name:", name_item
i = False
else:
i = False
if namef_obj:
i = True
Classic
Don't hammer nails with a microscope. Spend a day learning how to parse with lxml or beautifulsoup and you'll find the joy.
You can prepare/repartition html, remove newlines, etc.
without split, you can use re.finditer
. You can also first get all img, and then manually filter by class.
Once I successfully used the bike , for you there will be something like: qlkvg The trick is that parsing is not always needed, sometimes you just need to bite / get a couple of words from html. In my case, regex worked fine and was 100-1000 times faster than lxml (and its equivalents) . it was necessary to process only 1% of the document, and not parse the entire one.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question