Problem with understanding database design, help?

S

samorez9992022-01-03 17:10:44

Database design

samorez999, 2022-01-03 17:10:44

I have a large table, let's call it "ads". You need to assign several "tags" to one "advertisement".

Here's my line of thinking:
Create a separate "tags" table and list the primary keys in the "tags" column of the first table. But enumeration violates 1NF.
Create a many-to-many "declarations-tags" table, but there will be a lot of values after each "declaration" insertion.
You can list tags simply as a string in a field.

How is it done in real projects?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

V

Vasily Bannikov, 2022-01-03
@samorez999

for example, one ad has an average of 3 tags, for 1 million ads there will be 3 million entries in "tag ads", how will it be in terms of speed when sampling? Will I benefit greatly if I break the rule and write tags by listing them in a comma-separated string?

1. Imagine that the declaration-tag intermediate table is two guides.
Then one line will take about 32 bytes. 3*32*1000000 = 96 megabytes (metric)
2. Of course, fetching will be slow if you don't add an index. An index by ad id will take about the same amount. The sampling will be faster. For filters, you will also need to add an index in the opposite direction - from the id tag.
3. It will also be possible to filter by tags, which, it seems to me, is one of the most important qualities of tags.
4. If 96 megabytes scares you - use int64 or int32 - then it will be 2 or 4 times less, respectively

Will I benefit greatly if I break the rule and write tags by listing them in a comma-separated string?

As already said in the comments - you will be very upset.
The selection, of course, will be very fast, but
1. Imagine that the tag is 6 letters in Russian, and we store strings in Utf8 encoding.
Then 6*2*3+3+4=45 bytes will be spent on each ad. Accordingly, a million ads will take 45 metric megabytes.
2. But there will be no indexes here, and it will be very expensive to filter by that
PS: all estimates are taken from memory from the ceiling. In a real database, the numbers will be different, but approximately similar.
PPS: if it's not entirely clear what I'm suggesting, then here it is:

._________.             .______________.           .____________.
| post    |             | post_tag     |           | tag        |
|=========|             |==============|           |============|
| id: int |<------------| post_id: int |           | id: int    |
| ...     |             | tag_id: int  |---------->| name: text |
|_________|             |______________|           |____________|

D

Drno, 2022-01-03
@Drno

I'm not a database expert at all, but I can create a separate table for tags ... list them there,
then add a field / fields to the declaration and mark it there already. like 1 there is a tag, 0 there is no tag... and so on for each

K

Konstantin Tsvetkov, 2022-01-03
@tsklab

Using a separate table for tags is the correct solution.
But not everything is so clear. Using a list is also possible and has its advantages. For example, a separate table is used on Habré and there is no way to add your own clarifying tag for a question. On other web2, you can add your own to the list. Regarding the fear of deleting and renaming: has a tag ever been deleted on Habré? How long ago was it renamed?
The use of lists is supported by many DBMS, and some have a special data type.