Answer the question
In order to leave comments, you need to log in
Need advice on table organization (MySQL, MyISAM)
Approximately 50 thousand records per day are added to the table (now about 10 million records). Each entry contains 15 INT fields and one VARCHAR(32) field.
There was a task to add the new fields containing composite attribute. There are 7 types of attributes, each can take a value from 1 to 65535 (2 bytes). But each entry can contain no more than 3 at the same time (more precisely, either 0, or 1, or 3).
Only about 5% of entries will have the attribute. Most queries will select records based on the presence (at least some) or the complete absence of attributes. The parsing of attributes will take place outside the database (only output of information on the attributes of records, no search for records by a specific attribute).
What options come to mind:
1. On the forehead. 7 UNSIGNED SMALLINT fields (one field for each attribute type). Large volume, sample complexity (AT_1 > 0 OR AT_2 > 0 OR AT_3 > 0 ...), but ease of use.
2. Small space saving. 3 TINYINT fields and 3 UNSIGNED SMALLINT fields (attribute id and value pairs). Almost the same amount, slightly simpler fetch (AT_VAL_1 > 0 OR AT_VAL_2 > 0 OR AT_VAL_3 > 0), but the need to assign a specific ID to each attribute, plus more complex parsing of data after fetch.
3. More savings, more confusion. 3 INT fields (the first byte is the type, the remaining bytes are the value). It is difficult to find any advantages over the previous option (only if there are fewer fields).
4. Easier than ever. One BLOB field per 14 bytes. Only one field, the most simple selection (ATTR IS NULL), no additional confusion (each attribute has its own constant offset in the field, no need to parse id).
What do you advise?
Answer the question
In order to leave comments, you need to log in
Firstly, IMHO, it is better to use char instead of varchar, so that the ROW_FORMAT of the table becomes fixed, this will speed up the selection and recording. As for your options, I think you should use the first one.
The complexity of the sample for the first option (and for the rest, except for the last one) can be bypassed by introducing a boolean field HAS_ATS
You yourself write that the last option is the simplest, and has practically no drawbacks - use it. By the way, it probably makes no sense to set indexes - in your situation, they will only slow down the execution of the request.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question