J
J
Junior0072016-11-14 20:22:34
C++ / C#
Junior007, 2016-11-14 20:22:34

Did you implement the algorithm professionally?

Hello!
I deal with text parsing, I set myself the task of extracting all links from a string (link addresses).
Implemented like this:

std::string str = "<html><body><a href=\"url1\">name link1</a><div><a href=\"url2\">name link2</a></body></html>";
  std::regex reg("(<a href=\")([\\w\\s]*)(\">)(.*)(</a>)");
  std::smatch res;
  std::vector<std::string> arr;

  std::string tmp_str = str;
  while (std::regex_search(tmp_str, res, reg))
  {
    arr.push_back(res[2]);
    tmp_str = tmp_str.substr(res.position(2));
  }

Interested in the "professional" approach, so to speak. Of course, then I’ll wrap everything in a class and make it beautiful, now it’s the algorithm itself that interests me, maybe it can be done faster / better ?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
A
Artem Spiridonov, 2016-11-14
@Junior007

More general checklist:
All for sim. The rest is consequences.

R
Rou1997, 2016-11-14
@Rou1997

Clearly unprofessional as you didn't have a real task for which you would be paid money. No, I'm not confusing anything, the word professional has two meanings, but they are closely related.
And if you have it, but you can not check the solution for compliance with it, but instead ask for advice on the Toaster, then this is unprofessional.
On the topic:
1) use ready-made libraries, instead of regular expressions, which are inflexible, are a "bicycle" (you can not take into account a lot) and are difficult to perceive
2) most likely, do not use C ++, it is not fast on it
But this is for most tasks, not for everyone.

S
sim3x, 2016-11-14
@sim3x

Pro use someone else's code, not reinvent the wheel
https://github.com/google/gumbo-parser

R
Rsa97, 2016-11-14
@Rsa97

No. Test cases:

<a href="#hello">hello</a>
<a href="site.ru?12">hello</a>
<a href="hello-1">hello</a><a href="hello-2">hello</a>
<a href="hello">hello<a href="hello-2">hello</a>

Well, the brackets are twisted how much in vain.
PS A couple more examples:
<a title="привет" href="#hello">hello</a>
<a      href="test">test</a>

A
abcd0x00, 2016-11-18
@abcd0x00

The program (code) must be correct, understandable and easily changeable.
Correctness suffers there, because it does not take all the links that need to be taken, and takes what is not a link at all, as a link. (If you insert an html comment there with a link inside, it will easily determine this link as valid. And this often happens in real pages - the code is commented periodically and all this continues to be transmitted.)
Clarity suffers in stupid, non-self-explanatory variable names and confusing regular expressions. If you take a link, then in the regular expression only it has group brackets and should be, and not everything in a row. (Because this regex is obfuscated, it's easy for a bug to creep into it and you just don't notice it, since this regex is a chore to read every time and you just won't do it.)
Easy mutability is not particularly broken here, but only because the code is small. If it were bigger, it would also make itself felt.
In general, adding std:: everywhere doesn't make the code look professional, as it's just lame code with std:: added everywhere.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question