Answer the question
In order to leave comments, you need to log in
How to add unicode \xa0 to mask in python?
You need to search by mask.
Now 99% of the products are displayed, but a few are not displayed due to the code in the html.
Removed it with replace, added it to the mask and $nbsp; and \xa0 (and other variations), but still does not want to display this product.
Previously, I added other unicode characters in the mask like this (\u2122, etc.) and everything works.
This one doesn't want to. Tell.
Now the mask looks like this:
name_tovar = re.findall('\<b\>+[a-zA-Z0-9_/.?"= \>\< \\ \/-\: &]+\<\/b',page)
Answer the question
In order to leave comments, you need to log in
1. Inside the square brackets of the regexp, all characters are perceived either one at a time or as a range through a minus. Therefore & amp; means either &, or a, or m ... And there is no need to repeat the space in square brackets.
2. If you do not put r in front of the string constant, then the \ character is processed depending on the next character, which easily confuses the programmer. Most often, a regexp is defined with it, i.e. r'\< etc.' is more convenient.
3. To simply find the first product in tags, read about the non-greedy operator +
4. Still, parsing HTML with regexps can only be done where there is no way to put an HTML parsing library, for example, lxml.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question