C
C
Cyapa2013-12-08 01:21:09
MySQL
Cyapa, 2013-12-08 01:21:09

MySQL table architecture: strings or numbers

Gentlemen, I have a dilemma.
I reorganize the table in the database, this table has a column (hereinafter referred to as services) which is used to store the status of an object in some categories. Status is a number from 0 to 3. The number of categories can change (there are currently 11).
That is, for example:

Object 1:
Category 1 - Status 3
Category 2 - Status 1
Category 3 - Status 2
...
Object 2:
Category 1 - Status 1
Category 2 - Status 3
Category 3 - Status 2
...

Currently, VARCHAR is used to store statuses. That is, just 11 characters, each representing the status of a category with a number equal to that character's position.
...
(1, '31200000000'),
(2, '13200000000')
...

Very often you have to check the status of a category. LIKE is used for searching.
There is an idea to store these statuses in BIGINT using a four-digit number system. In other words, each digit of the fourfold number will store the status of the category.
...
(1, 39), #39 10 = 213 4
(2, 45) #45 10 = 231 4
...

To determine the status, the formula will be used:
services mod 4 service + 1 div 4 service

Where service is the index of the category, the report is from scratch.
I did a little test, on identical tables with 13,000 entries. Using the following queries:
/* Запрос 1 */ SELECT * FROM `test_b` WHERE `services` LIKE '___2%';
/* Запрос 2 */ SELECT * FROM `test_a` WHERE (`services` MOD 256) DIV 64 = 2;

Here are the results for 10,000 samples:
#Запрос 1Запрос 2
0128.21с27.31с
0227.18с27.26с
0328.31с27.56с
0429.25с27.14с
0529.18с27.47с
0627.60c27.47c
0727.74c26.79c
0827.43c26.95c
0927.99c26.52c
1028.80c28.06c

Judging by these tests, the variant with BIGINT wins 3.2% in speed. I can assume that with an increase in the volume of the table, the increase will only increase.
Test Technology
Замерялось все вот этим php кодом:
$time_taken = microtime(true);	
for($i = 0; $i < 10000; $i++)
{
  mysql_query("Запрос");
}
$time_taken = microtime(true) - $time_taken;


Also, for convenience, a stored function was written:
DROP FUNCTION IF EXISTS `GET_SERVICE_STATE`;
DELIMITER //
CREATE FUNCTION `GET_SERVICE_STATE`(`services` BIGINT, `service` BIGINT)
RETURNS TINYINT
BEGIN
  RETURN (`services` MOD (`service` * 4)) DIV `service`;
END //
DELIMITER ;

But the results of its use were deplorable.
SELECT * FROM `test_a` WHERE GET_SERVICE_STATE(`services`, 64) = 2;
Works ~20 times slower than the original request. I can't figure out if it's because of the extra multiplication or because of the function call itself?
For myself, I found the following pros and cons of the BIGINT method over the VARCHAR method:
+ Saving 3 bytes per entry in the table
+ Speed ​​increase, although not significant
+ No need to change the table structure when increasing the number of categories
- Limit on the number of categories - 31 thing
- In the database itself, the state of the categories is not visually visible
So far, I'm leaning towards the BIGINT method. Please help me make the right choice. Or suggest a more efficient way.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
V
Vladimir Polishchuk, 2013-12-08
@NorthDakota

I think your data structure is wrong.
I can suggest storing statuses in a separate table (cat_statuses).

+--------------------------------------+
|   id   |    cat_id    |    status    | 
+--------------------------------------+

Where status - TINYINT
cat_id - INT
Then just use RIGHT JOIN:
SELECT * FROM `cat_statuses` as cs 
RIGHT JOIN `test_b` as tb ON (as.cat_id = tb.services)
WHERE cs.cat_id=4;

Y
Yuri Yarosh, 2013-12-08
@d00mko

Don't know second normal form ?!
Sit down, two!

A
Alexey Shein, 2013-12-08
@conf

As far as I understand, you have objects that have many categories, which have many statuses.
I would divide this into 4 tables: objects, categories, statuses and a relationship table (object_id, category_id, status_id) - all 3 fields, by the way, can be combined into a composite primary key.
Because there are few statuses, you can take TINYINT as the primary key.
There is no need to make a 4-fold system - take pity on the brain of the one who will continue to support it.
Thus, you can easily make a request with joins for the necessary tables and everything will work very quickly.
Yes, and conducting speed tests based on 13,000 rows is not serious. Disable the mysql query cache (or simply SELECT SQL_NOCACHE <your query here> ) and make at least a million rows (so that the table does not fit into memory), then the results will be more interesting. Stored procedures by the way fulfill much more slowly than direct requests.
Read the book High Performance Mysql - it clears your brain a lot.

R
Rsa97, 2013-12-08
@Rsa97

Let your table of objects have a field objectId - the identifier of the object. Let's create an additional table linked by the objectId field to the table of objects. The table contains the fields of the service number serviceNum and its serviceStatus.

CREATE TABLE `service_states` 
  `objectId` BIGINT NOT NULL DEFAULT '0',
  `serviceNum` INT NOT NULL DEFAULT '0',
  `serviceState` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`objectId`,`serviceNum`),
  KEY `byServiceState` (`serviceState`),
  KEY `byServiceNum` (`serviceNum`))
ENGINE=InnoDB DEFAULT CHARSET=utf8';

A query for all objects similar to yours would look like
SELECT * FROM `test_b` as tb
RIGHT JOIN `service_states` as ss USING(objectId)
WHERE ss.serviceNum = 3 AND ss.serviceState = 2;

A
Alexey, 2013-12-08
Rekhov @Zorato

If you don't like the JOIN option so much, then try using the 10 number system (albeit a little redundant, but there will be a margin for statuses). The check will be reduced to the calculation of criteria on the PHP side:

$service = 3; // от 1 до 11
$status = 2; // от 1 до 4
$min = $status * pow(10, $service);
$max = ($status+1) * pow(10, $service);

And a subsequent query to MySQL:
At the same time, for performance, it would be necessary to add an index on the service column:
PS maybe something is wrong with the $min and max formula, but the whole idea is to use >, < (or BETWEEN) to check the desired digit.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question