Answer the question
In order to leave comments, you need to log in
Regular expression to find all links in html markup?
Been looking for a regular expression or any other way to find all links on a page for a long time.
A page is literally all the content . And I need to pack all these links into an array as strings. There were no problems with packing, but with a regular expression.
Everyone on the internet seems to agree that "find all links on a page" means "find all tags in html markup", which made it quite difficult to find.
Still, I found a couple of good ones, but one does not work with parameters, the other - with atypical characters, the third - does not find links of the form: if you make support , then the lines in the scripts of the form fall into the list.docunemt.querySelector('html').innerHTML
<a href="...">...</a>
//google.com
//google.com
//document.querySelector()
I tried to write a regular expression myself, I tried to create several and check one by one, but it didn’t work out.
My level of knowledge of regular expressions allowed me to compose something like this: , but this is very far from ideal.(http?s:\/\/|\.\/|\/\/).{0,})
/((http?s|ftp):\/\/|\.\/)[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]/gi
(((http?s:)|)\/\/\w+\.\w{2,3})(\.\w{2})?(\/\S*)?/gi
/((?:(http|https|Http|Https|rtsp|Rtsp):\/\/(?:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,64}(?:\:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,25})?\@)?)?((?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,64}\.)+(?:(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])|(?:biz|b[abdefghijmnorstvwyz])|(?:cat|com|coop|c[acdfghiklmnoruvxyz])|d[ejkmoz]|(?:edu|e[cegrstu])|f[ijkmor]|(?:gov|g[abdefghilmnpqrstuwy])|h[kmnrtu]|(?:info|int|i[delmnoqrst])|(?:jobs|j[emop])|k[eghimnrwyz]|l[abcikrstuvy]|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])|(?:name|net|n[acefgilopruz])|(?:org|om)|(?:pro|p[aefghklmnrstwy])|qa|r[eouw]|s[abcdeghijklmnortuvyz]|(?:tel|travel|t[cdfghjklmnoprtvwz])|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))|(?:(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])))(?:\:\d{1,5})?)(\/(?:(?:[a-zA-Z0-9\;\/\?\:\@\&\=\#\~\-\.\+\!\*\'\(\)\,\_])|(?:\%[a-fA-F0-9]{2}))*)?(?:\b|$)/gi
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta
name="viewport"
content="width=device-width, initial-scale=1.0"
>
<title>test 1</title>
</head>
<body>
<h3>test 1</h3>
<a href="https://google.com">google.com</a>
<a href="//google.com"></a>
<a href="//google.com/in/someelse/food.html">in/someelse/food</a>
<a href="./testPage2.html">test 2</a>
<a href="./weakPage.html?q=test">weak</a>
</body>
<script>
//document.querySelectorAll( 'a' ).forEach( l => l.onclick = function () { return false; } );
document.querySelector( 'h3' ).addEventListener( 'click', () => location.href = 'https://google.com' );
</script>
</html>
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question