H
H
Hint2011-01-12 13:10:17
MySQL
Hint, 2011-01-12 13:10:17

Need advice on table organization (MySQL, MyISAM)

Approximately 50 thousand records per day are added to the table (now about 10 million records). Each entry contains 15 INT fields and one VARCHAR(32) field.

There was a task to add the new fields containing composite attribute. There are 7 types of attributes, each can take a value from 1 to 65535 (2 bytes). But each entry can contain no more than 3 at the same time (more precisely, either 0, or 1, or 3).

Only about 5% of entries will have the attribute. Most queries will select records based on the presence (at least some) or the complete absence of attributes. The parsing of attributes will take place outside the database (only output of information on the attributes of records, no search for records by a specific attribute).

What options come to mind:
1. On the forehead. 7 UNSIGNED SMALLINT fields (one field for each attribute type). Large volume, sample complexity (AT_1 > 0 OR AT_2 > 0 OR AT_3 > 0 ...), but ease of use.
2. Small space saving. 3 TINYINT fields and 3 UNSIGNED SMALLINT fields (attribute id and value pairs). Almost the same amount, slightly simpler fetch (AT_VAL_1 > 0 OR AT_VAL_2 > 0 OR AT_VAL_3 > 0), but the need to assign a specific ID to each attribute, plus more complex parsing of data after fetch.
3. More savings, more confusion. 3 INT fields (the first byte is the type, the remaining bytes are the value). It is difficult to find any advantages over the previous option (only if there are fewer fields).
4. Easier than ever. One BLOB field per 14 bytes. Only one field, the most simple selection (ATTR IS NULL), no additional confusion (each attribute has its own constant offset in the field, no need to parse id).

What do you advise?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
L
L0NGMAN, 2011-01-12
@L0NGMAN

Firstly, IMHO, it is better to use char instead of varchar, so that the ROW_FORMAT of the table becomes fixed, this will speed up the selection and recording. As for your options, I think you should use the first one.

V
Vladimir Chernyshev, 2011-01-12
@VolCh

The complexity of the sample for the first option (and for the rest, except for the last one) can be bypassed by introducing a boolean field HAS_ATS

T
TimTowdy, 2011-01-12
@TimTowdy

You yourself write that the last option is the simplest, and has practically no drawbacks - use it. By the way, it probably makes no sense to set indexes - in your situation, they will only slow down the execution of the request.

A
apangin, 2011-01-13
@apangin

Another option: one field - the aggregate type of attributes (for example, a bit mask), and 3 SMALLINT. Space saving, simple fetch (ATTR_TYPES = 0), but more complex attribute parsing. Although the BLOB method still decides!

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question