A
A
Andrey Silaev2014-11-08 19:11:45
PHP
Andrey Silaev, 2014-11-08 19:11:45

PHP -> Regular expressions -> How to find something extra in the code?

Hello!
I've been scratching my head for 3 days now. I understand regular expressions well, but I can not solve this problem. In general, the essence is this, there are 2 html codes, one from the file, and the other from the browser.
That html code that from the browser may differ, because. scripts append their tags to the DOM.
Task: find extra tags that are added by scripts and remove them in order to bring the html code from the browser to the same form as in the file.
An illustrative example:
html code from a file:

<ul class="socials">
                            <li><a class="facebook" href="#"></a></li>
                            <li><a class="twitter" href="#"></a></li>
                            <li><a class="googleplus" href="#"></a></li>
</ul>

html code in the browser, after the scripts have added their styles:
<ul class="socials screen">
                            <li style="position: absolute; top: 10px;"><a class="facebook" href="#"></a></li>
                            <li style="position: absolute; top: 30px;"><a class="twitter" href="#"></a></li>
                            <li style="position: absolute; top: 50px;"><a class="googleplus" href="#"></a></li>
</ul>

As a result, you need to compare these 2 html codes, and at the output get an array of what you need to replace with in the second code in order to bring it to its original form.
Those. So:
[<ul class="socials screen">] => <ul class="socials">
[<li style="position: absolute; top: 10px;">] => <li>
[<li style="position: absolute; top: 30px;">] => <li>
[<li style="position: absolute; top: 50px;">] => <li>

Further, having this array, you can make a replacement and everything will work out.
here is an example, run try
@header('Content-Type: text/plain; charset=utf-8');
$sourse = '
<div id="logo" class="234234234">
<a href="#" title="Site Title">
<p>3</p> 
<span>Your Site Title</span>
</a>
</div>
<ul class="isotope" style="position: relative; overflow: hidden; height: 360px;" id="screens">
<li style="position: absolute; left: 0px; top: 0px;" class="screen usage isotope-item">
<a href="images/sample_showcase/640x480_usage.png" class="boxer" rel="gallery" title="Photo">
<img src="images/sample_showcase/140x140_usage.png" alt="Shot 1">
<div></div>
</a>
</li>
</ul>   
';
                            
                            
$sourse_php = '
<div id="logo">
<a href="#" title="Site Title">
<span>Your Site Title</span>
</a>
</div>
<ul id="screens">
<li class="screen usage">
<a href="images/sample_showcase/640x480_usage.png" class="boxer" rel="gallery" title="Photo">
<img src="images/sample_showcase/140x140_usage.png" alt="Shot 1" />
<div></div>
</a>                                
</li>
</ul> 
';


function strsoot($sourse_php){
$sourse_php = trim($sourse_php);
//$sourse_php = preg_replace('#([\s]{1,})#siu', ' ', $sourse_php);
$sourse_php = str_replace('/', '\/', $sourse_php);
$sourse_php = str_replace("'", "\'", $sourse_php);
$sourse_php = str_replace(".", "\.", $sourse_php);
$sourse_php = str_replace('$', '\$', $sourse_php);
$sourse_php = str_replace("^", "\^", $sourse_php);
$sourse_php = str_replace("?", "\?", $sourse_php);
$sourse_php = str_replace('*', '\*', $sourse_php);
$sourse_php = str_replace("!", "\!", $sourse_php);
$sourse_php = str_replace("#", "\#", $sourse_php);
$sourse_php = str_replace("№", "\№", $sourse_php);
$sourse_php = str_replace(":", "\:", $sourse_php);
$sourse_php = str_replace("+", "\+", $sourse_php);

return trim(preg_replace_callback('#(<[^>]*?>)#msiu', 'recode', $sourse_php));
}

function recode($sourse_ph){
    $sourse_ph = $sourse_ph[0];
    //$sourse_ph = preg_replace('#([\s]{2,})#msiu', ' ', $sourse_ph);
    $sourse_ph = str_replace(' ', '(.*?)', $sourse_ph);
    $sourse_ph = str_replace('>', '>(.*?)', $sourse_ph);
    $sourse_ph = str_replace('<', '<(.*?)', $sourse_ph);
    $sourse_ph = str_replace('"', '(.*?)"(.*?)', $sourse_ph);
    $sourse_ph = str_replace('(.*?)(.*?)(.*?)', '(.*?)', $sourse_ph);
    $sourse_ph = str_replace('(.*?)(.*?)', '(.*?)', $sourse_ph);    
    if (substr($sourse_ph, -5, 5) == '(.*?)') $sourse_ph = substr($sourse_ph, 0, (strlen($sourse_ph)-5));
    return trim($sourse_ph);     
}



$match_php = explode("\n", trim($sourse_php));


$match = explode("\n", trim($sourse));


foreach($match as $n=>$str){
    if (preg_match('/'.preg_quote($str, '/').'/msiu', trim($sourse_php))){
    }
    else{
        $reg_so[trim($str)] = strsoot($str);
    }
}

foreach($match_php as $n=>$str){
    if((preg_match('/'.strsoot($str).'/msiu', trim($sourse))) and (!preg_match('/'.preg_quote($str, '/').'/msiu', trim($sourse)))){
        $reg[trim($str)] = trim(strsoot($str));
    }
}

foreach($reg_so as $k=>$v){
    
    foreach($reg as $n=>$s){
        if(preg_match('/'.$s.'/msiu', trim($k))){
            $reg_so[$k] = $n;
        }else{
            $reg_so[$k] = $n.'delete';
        }    
    }
     
}

echo '$n=>$str
';
print_r($reg);
echo '
$k=>$v
';
print_r($reg_so);

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
Alexander, 2014-11-08
Madzhugin @Suntechnic

A strange task ... I wonder why?
And so - this is not solved by regular expressions. Use diff or some sort of diff editor.

A
Andrey Silaev, 2014-11-08
@Anderseno

when you edit the content in the wisywig editor, and it contains JS or yandex.maps for example, then they add extra tags. here they should be removed.

S
Sergey, 2014-11-08
Protko @Fesor

It's simple - regular expressions are not suitable for such tasks. For processing and filtering HTML, it is better to use a DOM tree traversal. This method is much more reliable and simple.
If you want to know in more detail why regular expressions are not suitable, I suggest reading this: habrahabr.ru/post/171667

K
KorsaR-ZN, 2014-11-08
@KorsaR-ZN

Try the PHPDiff class , it produces an array of different strings, that is, you can then replace the string in the second from one file, thereby bringing them to one form or this

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question