A
A
Amursky19882020-11-03 00:36:34
RAID
Amursky1988, 2020-11-03 00:36:34

Looking for NextCloud files?

Question to the organization of search in NextCloud 20
The raid contains 700 GB of them ~ 1,000,000 files. When trying to find something using the built-in search by inserting a query into the field, the system loads 1 processor thread at 100%, the search continues for 01min 13sec
5fa071a17b991646133978.jpeg
If you enter the file name in the search field manually, then the search will take 02min 13sec, while all 4 are loaded flow by 100% in the first half a minute, then the load is reduced to 50-70%.
5fa071bbf38b3643012990.jpeg
Searching for a file works the same regardless of which storage folder the search request was launched from. Let's say if I went to the folder "\docs\photo\folder1" and it contains the file "img_001.jpg" and I start looking for the file "img_001.jpg" from the folder "folder1", then the search will find it in about 01min 13sec (with provided that the file name has been copied and pasted into the search field). If you start searching for their storage root, as well as from any place, then the file search time will be the same. Naturally, the work of the search by this method does not allow you to use it comfortably.

And now the question is, what can help in reducing the time spent searching for a file?

  1. Changing the type of database ?
  2. File system change? storage
  3. Installing the system on ssd raid 0?
  4. Will replacing the CPU help?
  5. Are there applications / extensions for NextCloud that are focused on working with information search? Search for a file recursively from a specific directory, file by date, by size, by owner. i3 4160 4Gb RAM DDR3 storage
    4xHDD 500Gb RAID 10 (LSI Meda 9260 controller) -FS EXT4 system ssd 240Gb WD-Green - FS EXT4 snap NextCloud 20 database mysql 5.7.32 PHP 7.4.11 UBUNTU server 20.04

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Amursky1988, 2020-11-03
@Amursky1988

Solution found! After many tests, database changes, I finally found the reason for the long search. The problem was really in the database, or rather I will write a little lower. By enabling logging of slow queries SET GLOBAL slow_query_log=1; I tracked what request is formed through a web search

SELECT `filecache`.`fileid`, `storage`, `path`, `path_hash`, `filecache`.`parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `metadata_etag`, `creation_time`, `upload_time` FROM `oc_filecache` `filecache` LEFT JOIN `oc_filecache_extended` `fe` ON `filecache`.`fileid` = `fe`.`fileid` WHERE (`storage` = 2) AND (`name`  COLLATE utf8mb4_general_ci LIKE '%namefile%');

by executing this query directly in the mysql console, I made sure that the query was also executed for about a minute, this immediately removed suspicions of all kinds of speculation in the direction of web interface problems, earlier when I wrote that I made a request through the DB console and it was executed in a fraction of a second, then I issued an absolutely incorrect request, not the one that is generated by the system itself through a web search, this confused me.
a little googling, I got acquainted with the parameter
mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name           | Value     |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.01 sec)

mysql>

it gives an idea of ​​how much memory is allocated for storing data and table indexes. 134217728 is 128 MB . It is logical to assume that in order to allocate enough memory for storing data and table indexes, you need to know how much this data takes up space, for this we execute the command
mysql> SELECT table_schema "nextcloud",
    -> Round(Sum(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB"
    -> FROM   information_schema.tables
    -> GROUP  BY table_schema;
+--------------------+---------------+
| nextcloud          | DB Size in MB |
+--------------------+---------------+
| information_schema |           0.2 |
| mysql              |           0.8 |
| nextcloud          |         415.6 |
| performance_schema |           0.0 |
| sys                |           0.0 |
+--------------------+---------------+
5 rows in set (0.13 sec)

Where you can see that 415 MB is already taken , respectively, we need to set the innodb_buffer_pool_size parameter more than what the database takes from us, for example, let's give 1 GB
mysql> SET GLOBAL innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)

Check if the parameter has changed
mysql> SELECT @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                1073741824 |
+---------------------------+
1 row in set (0.00 sec)

Now we go to the web interface and refresh the page and check the search. And of course we make sure that everything is searched very quickly.
Everything is fine, of course, but after restarting the server, our changes in the innodb_buffer_pool_size parameter will not be saved, since the database configuration file is not edited in the snap version
/snap/nextcloud/current/my.cnf  вот этот файл и он только для чтения

How to fix it, I do not know yet, write if you know.
PS. chmod doesn't work. As I understand it, nextcloud mounts some partition along with this read-only file.
past reasoning in search of a solution to a problem
Спасибо, но я так понимаю это инструмент для поиска внутри файлов текстовой информации, если в настройках есть сканирование имен файлов, то это возможно поможет, отпишусь. Сейчас пробую запустить NextCloud на БД Postgresql, хочу посмотреть результат времени поиска стандартным инструментом, отпишусь.
Кстати отключение hyper threading в CPU увеличило время поиска всего 05 секунд :(
Пока незнаю точно о чем это может говорить, но есть предположение что смена CPU на более производительный не сильно поможет решении проблемы, всё-таки лишившись двух потоков из четырех время поиска практически не изменилось. Хотя надо все тестировать..

S
SpasiboMne, 2020-11-11
@SpasiboMne

Up to a certain point (version change or crooked settings) everything worked instantly. There are more files even than yours, the search was instantaneous. But, after this point, all the symptoms are the same as yours.

S
ScriptKiddo, 2020-11-03
@ScriptKiddo

Most likely Nextcloud + Elasticsearch will suit you
https://nextcloud.com/blog/nextcloud-11-introduces...
UPD:
Installation instructions: https://andalys.com/how-to-set-up-elastic-full -tex...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question