R
R
Roman Koff2014-11-10 14:59:43
HTML
Roman Koff, 2014-11-10 14:59:43

How to replace links in the text in C#, excluding tag parameters?

You need to replace all links in the text with the <a href=""></a> tag. In this case, you do not need to change links that are tag attributes (for example: <a href="http:// toster. ru/">Toaster</a> or <img src="http:// mysite. com/photo. jpg" />). (I put spaces because the toaster turns on its intellect)
Sample text:

Принцип восприятия http://google.ru непредвзято создает www.ya.ru
паллиативный интеллект, [email protected] условно. Концепция
<a href="http://mail.ru">ментально</a> оспособляет
<img src="http://bing.com/images/01.jpg" /> закон внешнего мира.

Result:
Принцип восприятия <a href="https://google.ru/">https://google.ru</a>
непредвзято создает <a href="http://www.ya.ru/">www.ya.ru</a>
паллиативный интеллект, [email protected] условно. Концепция
<a href="http://mail.ru">ментально</a> оспособляет
<img src="http://bing.com/images/01.jpg" /> закон внешнего мира.

Now there is a “plausible” code that works out the replacement of links, but I don’t know how to change it so that links in attributes are ignored (not strong in regexes).
private static Regex regExHttpLinks = new Regex(@"(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\k<wrap>)|\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]",
  RegexOptions.Compiled | RegexOptions.IgnoreCase);

public static string ParseHtml(this string source)
{
  if (string.IsNullOrEmpty(source))
    return source;
  var periodReplacement = "[]";
  source = Regex.Replace(source, @"(?<=\d)\.(?=\d)", periodReplacement);
  var linkMatches = regExHttpLinks.Matches(source);
  foreach (Match match in linkMatches)
  {
    var m = match.ToString();
    string s = (m.Contains("://")) ? m : "http://" + m;
    source = source.Replace(m,
      String.Format("<a href=\"{0}\" title=\"{0}\">{1}</a>",
      s.Replace(".", periodReplacement).ToLower(),
      m.Replace(".", periodReplacement)));
  }
  source = source.Replace(periodReplacement, ".");
  return source;
}

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andrey Panov, 2014-11-13
@Zarinov

Good afternoon.
A space because of the third piece of the regular expression, which does not check the context in which it is in any way. If you add negative previews ahead for the presence of tags, you should get what you need.
Check it out like this

(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\k<wrap>)|(?<!((a\shref=\")|(img src=")))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question