O
O
Oleg Sulima2018-08-27 15:13:23
Regular Expressions
Oleg Sulima, 2018-08-27 15:13:23

How to exclude links from text using regex?

You need to use one regular expression using javascript to find all the words in the text, but skip what is in square brackets and skip links. The words will be used for spell checking. Only one regular expression can be used.
Test text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu :fugiat: nulla pariatur. Excepteur sint
occaecat :cupidatat [non] proident:, sunt in culpa qui officia
deserunt mollit anim id est laborum.

abcdefghijklmnopqrstuvwxyz [ABCDEFGHIJKLMNO] PQRSTUVWXYZ
0123456789 _+-.,[email protected]#$%^&*();\/|<>"'
12345 -98.7 3.141 .6180 9,000 +42
555.123.4567    +1-(800)-555-2468
[email protected]    <[email protected]>
www.demo.com    {http://foo.co.uk/  }  
[https://marketplace.visualstudio.com/ite]   ms?itemName=chrmarti.regex
https://github.com/chrmarti/vscode-regex asdfasdf

Let's say that you can find all the words using
/\b[^\s]+\b/g
Let's say that you can find the text in square brackets using
/[^\[]*\]/g
If you combine them, then it turns out
/(\b[^\s]+\b)(?!([^\[]*\]))/g
. If I understand correctly, then the second group filters the first. For links, I sketched an expression.
/\b(http)[^\s]+\b/g
By analogy, I'm trying to add it with a third group with negative lookagead, so that the first group looks for all words, the second removes everything in brackets from there, and the third removes links. It turns
/(\b[^\s]+\b)(?!([^\[]*\]))(?!(\bhttp[^\s]+\b))/g
out But the links are not removed. Do groups work the way I want them to? Or am I going in the wrong direction?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
Nick Sdk, 2018-08-27
@Oleg3474

/(\b[^\s]+\b)(?!([^\[]*\]))/g
If I understand correctly, the second group filters the first

Discover online regex editors
https://regex101.com/r/jB5JrR/1
and in general it's better to read manuals on regular expressions :)
your second "group" is not a group, but a negative forward looking
i.e. you find "words" after which there is nothing in your "group"(?!([^\[]*\]))
it doesn't work the way you described)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question