P
P
PO6OT2015-10-05 12:55:12
PHP
PO6OT, 2015-10-05 12:55:12

How effective is the script↓?

Here is a primitive DB script to store links for indexing:

<?php

define('xreservedmaxstrings', 1000000); //кол-во строк в списке
define('xreservedsize', 128); //длинна строки в списке

function mkfile($path){
    fclose(fopen($path, 'a'));
}

function putdata($data){

    $lock=fopen(__FILE__.'.1.lock', 'w+');
    flock($lock, LOCK_EX); //блокируем файл lock для избежания коллизий

    $i=1;
    while(true){
        if($i<16)
            $pre=0; //ведущий ноль

        $dir='./list/';
        $list=$dir.$pre.dechex($i).'.list'; //путь к списку

        if(!file_exists($dir))
            mkdir($dir); //если нет папки, создать

        if(!file_exists($list))
            mkfile($list); //если нет списка, создать

        if($i>255){
            $r=false;
            echo '<t>[u01] Lists are full.</t>'."\n";
            break;
        } //если заполнены 255 списков, выдать ошибку

        if(filesize($list)<(xreservedmaxstrings*(xreservedsize+1))){ //если список не заполнен
            $data.=str_repeat(' ', xreservedsize);
            $data=substr($data, 0, xreservedsize);
            file_put_contents($list, $data."\n", LOCK_EX|FILE_APPEND);
            $r=true;
            break;
        } //записать data в список, дополнив пробелами и обрезав, чтобы data весил xreservedsize байт

        $i++;
        $pre='';

    }

    flock($lock, LOCK_UN);
    fclose($lock);

    return $r;
}

function getdata($index){

    $lock=fopen(__FILE__.'.1.lock', 'w+');
    flock($lock, LOCK_EX); //блокируем файл lock для избежания коллизий

    $listindex=ceil($index/xreservedmaxstrings); //вычисляем номер списка
    $liststring=($index-($listindex-1)*xreservedmaxstrings); //вычисляем номер строки
    $pre='';
    if($listindex<16)
        $pre=0; //ведущий ноль

    $list='./list/'.$pre.dechex($listindex).'.list'; //путь к списку

    if(!file_exists($list)){
        echo '<t>[u02] Requested list is not exists.</t>'."\n";
        $r=false;
    } //если нет нужного списка, выдать ошибку
    elseif(filesize($list)<($liststring*(xreservedsize+1))){
        echo '<t>[u03] Requested string is not exists.</t>'."\n";
        $r=false;
    } //если нет нужной строки, выдать ошибку
    else{
        $listf=fopen($list, 'r');
        fseek($listf, (($liststring-1)*(xreservedsize+1)));
        $string=fgets($listf, xreservedsize);
        $r=rtrim($string, ' ');
    } //выдать строку, убрав лишние пробелы

    flock($lock, LOCK_UN);
    fclose($lock);

    return $r;
}

How efficient is it to use with a total data volume of about 3 GB (xreservedmaxstrings will be larger)?
Can you advise some technologies for faster sampling, less memory consumption, otherwise I'm zero in the database?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alex Safonov, 2015-10-05
@woonem

You have an unpaired number of opening and closing curly braces.
xreservederror is a constant? And this is $GLOBALS[xreservederror] what is it?
hexdec($i) I would replace with intval($i,16). But hexdec('ff') would generally be replaced by 255. We
would create a variable.
This piece would be:

if($i<16)
      $pre=0;

    if(!file_exists('./db/'.$pre.dechex($i).'.list'))
      mkfile('./db/'.$pre.dechex($i).'.list');

If $i>=16, then there will be no $pre variable. As 27cm
pointed out , this is something odd.
if($i>hexdec('ff')){
      return false;
      $GLOBALS[xreservederror].='<t>[u01] Database is full.</t>';
      break;
    }

Here is that piece:
$data.=str_repeat(' ', $size);
$data=substr($data, $size);

As I understand it, you take some string and increase its length by $size, and then take a fragment of the resulting string, indenting $size from the beginning of the string. I'm seeing a lot of input here that will result in a file filled with spaces.
It would be easier to tell you about the code if you explained what it does, wrote comments on some murky pieces of code, and also explained why the Database tag is in your question .
---------------
UPD
Oh! Thanks for commenting out the code, it's a little clearer. I understand your question.
You want to build your data storage on a file system. Volumes will be around 3GB.
---------
I will now tell you a seditious thing, but it seems obvious. The file system is not the best option in terms of performance. And your code does not provide any caching.
I would move away from the idea of ​​using my bike. The industry has invented a bunch of different vaults for you. Sql, NoSql, key-value, hash tables. Do you want to store something on disk?!
MySQL and Redis store data on disk, but at the same time they have caching and optimization mechanisms, uploading to RAM.
And besides, your code:
- is too primitive and not optimal. There is no support for transactions and no explicit locking mechanism. And this will come out with collisions under load.
- does not take into account very important nuances of work, for example , how php works with the file system .
Performance will be mediocre, and most importantly with collisions.
You will create unnecessary load on the disk.
The question should be put not in gigabytes, but in calls per second. The disk will freeze.

C
Cat Anton, 2015-10-05
@27cm

Even though I'm not invited, something seems to be wrong here:

if($i>hexdec('ff')){  // $i > 0xFF
    return false;

    // Сюда никогда не попадём =(

    // А тут ещё и undefined константа
    $GLOBALS[xreservederror].='<t>[u01] Database is full.</t>';
    break;
}

Unused variable:
If you write to and read from a file, the file does not become a database.
In general, this is shit code, because:
  • No comments yet
  • No formatting
  • Using $GLOBALS to save error messages, although there are exceptions for this
  • Fatal error , warning, ...
  • Code duplication is visible to the naked eye:
    if(!file_exists('./db/'.$pre.dechex($listindex).'.list')){
        // ...
    }elseif(filesize('./db/'.$pre.dechex($listindex).'.list')<($liststring*($size+1))){
        // ...
    }else{
        $listf=fopen('./db/'.$pre.dechex($listindex).'.list', 'r');
        fseek($listf, ($liststring*($size+1)));
        // ...
    }

    What prevents you from generating 3GB of data and checking, measuring time and memory?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question