A
A
Ayk722015-11-03 11:20:27
PHP
Ayk72, 2015-11-03 11:20:27

How to store so much data?

Good afternoon!
We were given the task to download from 25 archives, where each contains 14,000 xml files, the data of VK accounts. Each xml file weighs 310kb.
When uploading to MySQL, this is 80GB, and hosting only allows a database of a maximum of 20GB.
Please tell me how can I solve this?
What if not profile data is loaded into MySQL, but a link to an xml file and id? And take the data when requesting not from MySQL, but from XML directly. Is this a bad method? Will it be dull? What would you do?
Thank you!

Answer the question

In order to leave comments, you need to log in

4 answer(s)
M
Max, 2015-11-03
@MaxDukov

something intuition tells me that it is loaded to the dope of any garbage. And that if you ship wisely, then the size will decrease significantly.
Can I see an XML example? without PD, of course.

R
Roman Mirilaczvili, 2015-11-03
@2ord

You can reduce the size of the database
You can also try to store more data in MongoDB, if the hosting allows.
You can also read data from XML, but you need to deserialize the data every time - parse the XML syntax using library tools. Depends on the expected load on the server.

M
Max, 2015-11-03
@WarStyle

Buy GDS with the required amount of disk space, put lamps and fill in)

D
Denis, 2015-11-04
@prototype_denis

25*14000*350 = 122.5 gigabytes (According to Google)
Places 20 gigabytes.
We look at xml, which is in utf-8 (more than sure)
<first_name>Василий</first_name>(or something like that)
The data itself in the line is about 2 times less.
That is 61.25 gigs.
Remember that utf-8 is 4 bytes per character.
We take at least koi-8 and reduce everything by 4 times.
It turns out 15.3125 gigs.
15.3125 is an approximate figure.
Next, we look into the database... We
convert some types, such as dates, boolean values, and other possible ones..
We decrease another percentage by 10.
13.78125 gigs.
After optimization of all data, duplicate data will appear, which will be placed in separate tables. (same resources for example)
For 350,000 records, about 5 percent ...
As a result, 13.0921875 gigs.
In general, the answer to your question will be: "And store it, just change the encoding in the database."

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question