D
D
doublench212014-10-12 18:34:06
PHP
doublench21, 2014-10-12 18:34:06

Are there any critical errors?

I am writing a parser. Works slowly, but it does not matter.
Question: Why does it not display a success message in cases of the 1st success? Exactly the 1st
Question: Why does he just stop as a couple?
If you somehow see this in the code, please point out the errors.
PS Do not kick strictly, I'm learning. And I use old functions.

<?php
require_once 'libs/simple_html_dom.php';

$server = ini_get("mysql.default_host");
@mysql_connect($server, "u360508016_root", "********") or die();
mysql_select_db("u360508016_base");
mysql_set_charset("utf8");

$url = "http://minecraft.gamepedia.com/Minecraft_Wiki";
$i = 1;
do {
    parser($url);
    if ($i >= 2) {
        $q = mysql_query("SELECT * FROM `indexing_link` WHERE `id`='" . $i . "'") or die();
        $url = mysql_result($q, 0, 1);
    }
    $i++;
    $q = mysql_query("SELECT * FROM `indexing_link` WHERE `id`='" . $i . "'") or die();
    echo "SUCCESS! " . $i . "\n";
} while (@mysql_num_rows($q));

function parser($url)
{
    $html = file_get_html($url);

    /**
     * Получаем все внутренние ссылки
     */
    /*if ($html->innertext != '' and count($html->find('a'))) {
        foreach ($html->find('a[href^=/] ') as $a) {
            echo "<a href='http://minecraft.gamepedia.com" . $a->href . "'>" . $a->plaintext . "</a></br>";
        }
    }*/

    /**
     * Получаем пару title-описание основной ссылки
     */
    $title = "sdasd";
    /* $title = $title->plaintext;*/
    $short = "zxcxzc";/*$html->find('#mw-content-text');
    $short = $short->find('p', 0);
    $short = $short->plaintext;*/

    /**
     * Пишем все внутренние ссылки в бд
     */
    $link_id = mysql_query("SELECT * FROM `indexing_link` WHERE `url`='" . $url . "'") or die();
    if (mysql_num_rows($link_id) == 0) {
        mysql_query("INSERT INTO `indexing_link` (`url`, `title`, `short`) VALUES ('" . $url . "', '" . $title . "', '" . $short . "')") or die();
        $link_id = mysql_query("SELECT * FROM `indexing_link` WHERE `url`='" . $url . "'") or die();
    }

    if ($html->innertext != '' and count($html->find('a'))) {
        foreach ($html->find('a[href^=/] ') as $a) {
            $q = mysql_query("SELECT * FROM `indexing_link` WHERE `url`='http://minecraft.gamepedia.com" . $a->href . "'") or die();
            if (mysql_num_rows($q) == 0) {
                mysql_query("INSERT INTO `indexing_link` (`url`) VALUES ('http://minecraft.gamepedia.com" . $a->href . "')") or die();
                $link_id1 = mysql_query("SELECT * FROM `indexing_link` WHERE `url`='http://minecraft.gamepedia.com" . $a->href . "'") or die();
            }
            /**
             * Пишем пару откуда-куда в бд
             */
            if (@mysql_result($link_id1, 0, 0)) {
                mysql_query("INSERT INTO `indexing_how_where` (`how`, `where`) VALUES  ('" . mysql_result($link_id,
                        0, 0) . "', '" . mysql_result($link_id1, 0, 0) . "')") or die();
            }
        }
    }

    /**
     * Получаем все текстовые блоки в html
     */
    $plaintext = $html->plaintext;
    /*echo $plaintext, "<br><br>";*/

    /**
     * Оставляем буквы латинского алфавита и пробелы
     */
    $pattern = '/[A-Za-z]|[ \t]/';
    preg_match_all($pattern, $plaintext, $matches);
    foreach ($matches[0] as $key => $value) {
        if ($value == " ") {
            $matches[0][$key] = "\t";
        }
    }

    $arr = array();
    $flag = 0;

    foreach ($matches[0] as $value) {

        if ($value != "\t") {
            $arr[] = $value;
            $flag = 0;
        } elseif ($flag == 0) {
            $arr[] = " ";
            $flag = 1;
        }
    }
    $str = implode($arr);
    $word = explode(" ", $str);

    /**
     * Пишем слова в бд и указываем ссылку
     */
    foreach ($word as $value) {
        $q = mysql_query("SELECT * FROM `indexing_word` WHERE `word`='" . $value . "'") or die();
        if (mysql_num_rows($q) == 0) {
            mysql_query("INSERT INTO `indexing_word` (`word`) VALUES ('" . $value . "')") or die();
            $word_id = mysql_query("SELECT * FROM `indexing_word` WHERE `word`='" . $value . "'") or die();
        }
        /**
         * Пишем пару слово-ссылка в бд
         */
        if (@mysql_result($word_id, 0, 0)) {
            mysql_query("INSERT INTO `indexing_link_word` (`word_id`, `link_id`) VALUES  ('" . mysql_result($word_id,
                    0, 0) . "', '" . mysql_result($link_id, 0, 0) . "')") or die();
        }
    }


    $html->clear();
    unset($html);
}

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Roquie, 2014-10-12
@doublench21

1. don't use mysql_* functions. Generally. Never!
advice: use PDO or mysqli_* functions.
2. you can not make requests in a cycle. Form an array and convert it to an SQL string. And then execute 1 request. This will significantly speed up the script. This applies to ALL of your select, insert queries.
3. in order to get html from the site, it is better to use
https://github.com/php-curl-class/php-curl-class
4. in order to get information from the page,
webcache.googleusercontent.com/search is great ?q=cache:Qvfn...
you will get an output array, which is not that hard to process.
5. Question: Why does it just stop in pairs?
because it takes more than 30 seconds. You can remove the restriction like this:
as @FanatPHP mentioned, there is really a lot of code and it is not written in the best way, so reading it is not the most interesting task.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question