R
R
roman012020-01-31 08:03:37
Python
roman01, 2020-01-31 08:03:37

Finding data in html using regular expressions?

Colleagues, good day! Help advice or code.

In general, I need to look for some data in the html page:

In the html page, I need to find such a block

s = "<span class=\"company__segment__change\"> <span class=\"company__segment__inner\">$$$</span> </span>"


where $$$ is any number. I don't know in what form. Maybe in decimal, maybe 0.11 or 0.1111. That is, the separator is not known and in general, whether it will be.

I started to implement it, but something is cumbersome, I myself feel that it is not optimal and not correct.

Is it possible to solve this problem using regular expressions? There is no experience with the re module in python at all. I have not come across regular expressions yet, except for very primitive tasks.

Submit a sample if possible.

With respect, Roman

Answer the question

In order to leave comments, you need to log in

2 answer(s)
K
KraydenSharp, 2020-01-31
@roman01

Hello.
There is such a variant of the regular expression https://regex101.com/r/KRX5Qe/1
Looks for numbers with and without a minus , integers and reals , with a separator "dot" and "comma" .
If you need a code in php, then in the menu on the left there is a tool "Generated Code"
For python, it is enough to remove jealous quantifiers https://regex101.com/r/KRX5Qe/3
If this solution does not suit you, then you can describe your task in more detail. I'll try to help.

A
Andrew, 2020-01-31
@freiman

Try to take library for work with HTML.
I didn’t use it, but they say it’s fast https://github.com/rushter/selectolax
If the processing speed is not critical, then lxml, and then search by xpath.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question