Answer the question
In order to leave comments, you need to log in
PHP -> Regular expressions -> How to find something extra in the code?
Hello!
I've been scratching my head for 3 days now. I understand regular expressions well, but I can not solve this problem. In general, the essence is this, there are 2 html codes, one from the file, and the other from the browser.
That html code that from the browser may differ, because. scripts append their tags to the DOM.
Task: find extra tags that are added by scripts and remove them in order to bring the html code from the browser to the same form as in the file.
An illustrative example:
html code from a file:
<ul class="socials">
<li><a class="facebook" href="#"></a></li>
<li><a class="twitter" href="#"></a></li>
<li><a class="googleplus" href="#"></a></li>
</ul>
<ul class="socials screen">
<li style="position: absolute; top: 10px;"><a class="facebook" href="#"></a></li>
<li style="position: absolute; top: 30px;"><a class="twitter" href="#"></a></li>
<li style="position: absolute; top: 50px;"><a class="googleplus" href="#"></a></li>
</ul>
[<ul class="socials screen">] => <ul class="socials">
[<li style="position: absolute; top: 10px;">] => <li>
[<li style="position: absolute; top: 30px;">] => <li>
[<li style="position: absolute; top: 50px;">] => <li>
@header('Content-Type: text/plain; charset=utf-8');
$sourse = '
<div id="logo" class="234234234">
<a href="#" title="Site Title">
<p>3</p>
<span>Your Site Title</span>
</a>
</div>
<ul class="isotope" style="position: relative; overflow: hidden; height: 360px;" id="screens">
<li style="position: absolute; left: 0px; top: 0px;" class="screen usage isotope-item">
<a href="images/sample_showcase/640x480_usage.png" class="boxer" rel="gallery" title="Photo">
<img src="images/sample_showcase/140x140_usage.png" alt="Shot 1">
<div></div>
</a>
</li>
</ul>
';
$sourse_php = '
<div id="logo">
<a href="#" title="Site Title">
<span>Your Site Title</span>
</a>
</div>
<ul id="screens">
<li class="screen usage">
<a href="images/sample_showcase/640x480_usage.png" class="boxer" rel="gallery" title="Photo">
<img src="images/sample_showcase/140x140_usage.png" alt="Shot 1" />
<div></div>
</a>
</li>
</ul>
';
function strsoot($sourse_php){
$sourse_php = trim($sourse_php);
//$sourse_php = preg_replace('#([\s]{1,})#siu', ' ', $sourse_php);
$sourse_php = str_replace('/', '\/', $sourse_php);
$sourse_php = str_replace("'", "\'", $sourse_php);
$sourse_php = str_replace(".", "\.", $sourse_php);
$sourse_php = str_replace('$', '\$', $sourse_php);
$sourse_php = str_replace("^", "\^", $sourse_php);
$sourse_php = str_replace("?", "\?", $sourse_php);
$sourse_php = str_replace('*', '\*', $sourse_php);
$sourse_php = str_replace("!", "\!", $sourse_php);
$sourse_php = str_replace("#", "\#", $sourse_php);
$sourse_php = str_replace("№", "\№", $sourse_php);
$sourse_php = str_replace(":", "\:", $sourse_php);
$sourse_php = str_replace("+", "\+", $sourse_php);
return trim(preg_replace_callback('#(<[^>]*?>)#msiu', 'recode', $sourse_php));
}
function recode($sourse_ph){
$sourse_ph = $sourse_ph[0];
//$sourse_ph = preg_replace('#([\s]{2,})#msiu', ' ', $sourse_ph);
$sourse_ph = str_replace(' ', '(.*?)', $sourse_ph);
$sourse_ph = str_replace('>', '>(.*?)', $sourse_ph);
$sourse_ph = str_replace('<', '<(.*?)', $sourse_ph);
$sourse_ph = str_replace('"', '(.*?)"(.*?)', $sourse_ph);
$sourse_ph = str_replace('(.*?)(.*?)(.*?)', '(.*?)', $sourse_ph);
$sourse_ph = str_replace('(.*?)(.*?)', '(.*?)', $sourse_ph);
if (substr($sourse_ph, -5, 5) == '(.*?)') $sourse_ph = substr($sourse_ph, 0, (strlen($sourse_ph)-5));
return trim($sourse_ph);
}
$match_php = explode("\n", trim($sourse_php));
$match = explode("\n", trim($sourse));
foreach($match as $n=>$str){
if (preg_match('/'.preg_quote($str, '/').'/msiu', trim($sourse_php))){
}
else{
$reg_so[trim($str)] = strsoot($str);
}
}
foreach($match_php as $n=>$str){
if((preg_match('/'.strsoot($str).'/msiu', trim($sourse))) and (!preg_match('/'.preg_quote($str, '/').'/msiu', trim($sourse)))){
$reg[trim($str)] = trim(strsoot($str));
}
}
foreach($reg_so as $k=>$v){
foreach($reg as $n=>$s){
if(preg_match('/'.$s.'/msiu', trim($k))){
$reg_so[$k] = $n;
}else{
$reg_so[$k] = $n.'delete';
}
}
}
echo '$n=>$str
';
print_r($reg);
echo '
$k=>$v
';
print_r($reg_so);
Answer the question
In order to leave comments, you need to log in
A strange task ... I wonder why?
And so - this is not solved by regular expressions. Use diff or some sort of diff editor.
when you edit the content in the wisywig editor, and it contains JS or yandex.maps for example, then they add extra tags. here they should be removed.
It's simple - regular expressions are not suitable for such tasks. For processing and filtering HTML, it is better to use a DOM tree traversal. This method is much more reliable and simple.
If you want to know in more detail why regular expressions are not suitable, I suggest reading this: habrahabr.ru/post/171667
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question