How to extract data from html code using regular expressions in notepad++?

V

Valery Pavlenko2018-01-23 11:14:15

Regular Expressions

Valery Pavlenko, 2018-01-23 11:14:15

Hello colleagues!
You need to extract data from html code. There are actually a lot of lines, but I will give just a couple of lines as an example.
Code :

<p class="pic"><a href="/film/stalnoy-alkhimik-2009-452838/sr/1/" class="js-serp-metrika" data-url="/film/stalnoy-alkhimik-2009-452838/" data-id="452838" data-type="series"><img class='flap_img' src="https://st.kp.yandex.net/images/spacer.gif"  title="/images/sm_film/452838.jpg" alt="Стальной алхимик" title="Стальной алхимик" /></a></p>
<p class="pic"><a href="/film/inicial-di-1998-230874/sr/1/" class="js-serp-metrika" data-url="/film/inicial-di-1998-230874/" data-id="230874" data-type="series"><img class='flap_img' src="https://st.kp.yandex.net/images/spacer.gif"  title="/images/sm_film/230874.jpg" alt="Инициал &laquo;Ди&raquo;" title="Инициал &laquo;Ди&raquo;" /></a></p>

Final data to get :

Стальной алхимик 2009
Инициал Ди 1998

That is, you need to extract the data from the alt or title attribute and the year number from the link .
I would be very grateful for your help!

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

H

Hosting Yaroslavl, 2018-01-23
@trudogolik

Replace
with
Then, if desired, replace special characters (such as "laquo") with necessary.
If there is no plugin - install knowing the name
Plugins-> Plugin manager

I

Ilyas, 2018-01-23
@id2669099

it seems to me that notepad ++ is not very suitable for such situations, I think it's easier to take some language where there is a library with html parsing and use it to extract attributes.
In theory, years can be drawn out with a similar regular expression:
-[1-9][0-9]{3}-