S
S
SkyNezu2019-02-27 12:40:50
Database
SkyNezu, 2019-02-27 12:40:50

Which database to choose for processing 300 million rows?

Good afternoon.
Available: several text files, ranging in size from 5.5 million lines to 150 million lines. The total number is about 300-350 million lines. The files are updated every day. Each line contains:
1. String data, no longer than 128 characters.
2. Date.
3. Date.
4. Comment. The length is not known.
Task: upload these files to the database daily. Compare the new version of the loaded data with the previous one, mark new lines. Every day, on new lines, organize a search for keywords / symbols / phrases, there are about 20 of them. On demand, search the entire data array for keywords.
What DB to select for these purposes?
- Free.
- With quick search.
- For administration that does not require highly specific knowledge. The end user is not an IT specialist.
- The ability to quickly transfer from Linux to Windows, from the server platform to a regular working PC and vice versa.
So far they have suggested: postgresql, firebird, nosql, mysql.
Please tell me which database is more suitable for the described tasks and why?
Z.Y. Please do not throw stones. How the task was set, and described it here.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Saboteur, 2019-02-27
@saboteur_kiev

nosql is mainly for strings in the form of "key" - "value", not four values. Therefore I would postpone nosql at once.
And so - any will do - you do not have a complex structure with a bunch of connections or logic in the database itself.
That is, the required functionality is quite simple, which means that performance will depend more on hardware than on the base - and mysql and mariadb and pgsql will work approximately the same. Well, except to play around with the type of base and indexes.

B
beduin01, 2019-02-27
@beduin01

Enough with MariaDB/MySQL. PostgreSQL is also possible, but it is a little more difficult to administer and configure.

R
Roman Mirilaczvili, 2019-02-27
@2ord

- For administration that does not require highly specific knowledge. The end user is not an IT specialist.
- The ability to quickly transfer from Linux to Windows, from the server platform to a regular working PC and vice versa.

In my opinion, the ease of administration of SQLite is unrivaled. In a single file, the entire database.
With indexes, everything will be fast.

N
nrgian, 2019-05-12
@nrgian

300 million is nonsense for modern DBMS.
Take what is convenient for you personally.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question