T
T
Timur2013-02-13 20:19:55
Regular Expressions
Timur, 2013-02-13 20:19:55

Strange "negative look ahead" behavior. RegExp example inside

I can't understand the reason. Tell me please. Task: find in the CSS file all the background properties that have a relative path to the image in the url and change it to an absolute one. Url can be enclosed in either single or double quotes. A url starting with a slash is considered absolute.

Here is the regular season:
url\(("|')?(?!\w*?:?//|/)(.+?)("|')?\)

And here is the result:

Example No. 1

.class {
  background: url(images/bubbles.png) top repeat-x;
}

returns everything correctly:
Group 1: no match
Group 2: images/bubbles.png
Group 3: no match

Example #2
.class {
  background: url(/images/bubbles.png) top repeat-x;
  background: url(http://images/bubbles.png) top repeat-x;
}

returns everything correctly: no match

Example #3
.class {
  background: url('images/bubbles.png') top repeat-x;
}

returns everything correctly:
Group 1: '
Group 2: images/bubbles.png
Group 3: '

Example #4: trouble. It is worth enclosing the absolute url in quotes
.class {
  background: url('http://images/bubbles.png') top repeat-x;
}

we get
Group 1: no match
Group 2: 'http://images/bubbles.png
Group 3: '

Question: why is the first match group empty, the second contains a quote and a path, while the third one works correctly?

PS Testing here . PHP 5.3 produces the same result.
PPS I'm not for parsing other people's sites. I am finishing ExtendedClientScripts for Yii

Answer the question

In order to leave comments, you need to log in

2 answer(s)
E
egorinsk, 2013-02-13
@egorinsk

I advise you to make the regular expression more stable and reliable, instead of a dot, explicitly set the list of characters allowed in the url: [
^'"\\s]+?

W
Wott, 2013-02-14
@Wott

Question: why is the first group of matches empty, the second contains a quote and a path, while the third one works correctly?

Non-greedy quantifiers are filled from the minimum - in your case, the first one is empty at first, the negative match is performed from the quote and then everything works and the regexp is successfully executed.
In your case, where everything is optional, I would
include url () in the search - it’s more efficient to have a reliable and static anchor, sorry, I somehow missed that it already exists,
I would more strictly prescribe conditions for an absolute url like |/
I would search for a path by type ([ ^'")]+) - greedy for a forbidden character is faster and more reliable

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question