How to properly store/select data in PostgreSQL?

E

Eugene Wolf2016-11-28 14:23:03

PostgreSQL

Eugene Wolf, 2016-11-28 14:23:03

Good day dear!
In continuation of my previous questions,
[PostgreSQL] How to cast strings to INT and other data types?
How to cast an array of numbers from a VARCHAR field to INTEGER type in PostgreSQL?
The essence of the problem: there is some data set (from 1 to 1000+) of records strictly tied to another record (simple relationship, one to many). In view of the fact that this data is of a mixed type (it can be both strings, with all the consequences) and numbers. The data is stored in VARCHAR format for the reasons described above.
At the same time, we need to work with this data, depending on the situation, both with strings and as with numbers. That is, if the search condition is set as "search by strings" - we are looking for everything at once, both by strings and by numbers, as if these are all strings (formally, it is). If the condition is to search for ranges, for example:
... WHERE n >= 10 AND n <= 100;
then we need to select only numbers and compare them accordingly.
As I see solutions to the problem:
Option 1 We store string data in a table for strings, numeric data in a table for numbers (and apparently, for fractional ones, we will have to create a personal table), and depending on the search conditions, we make a selection from two tables. There are minor problems here:
a) The data is fragmented
b) The system will work in such a way that when determining the format of the input data, it will write them to the desired table, while there is some probability of an erroneous definition, because not the fact that "333555" is the sum of something, and not a phone number or something else other than the sum.
Option 2.1 We store all the data in one table, in the VARCHAR format, and by indirect sign we separate the numbers, for example like this:

SELECT field1::integer FROM table1 WHERE field1 ~ E'^\\d+$' AND field1::integer > 3;

in this variant, the regular season confuses me... Although it is very small, it is still a regular season.
Option 2.2 We add another flag field (number -> true/false) that will determine what is stored in this string, a number or a string. Accordingly, the search by strings works in the normal mode, and the search by numbers no longer works in the regular expression format, but based on flags.
In this version, I do not like the additional entity and additional logic, but we have already got rid of the regular expression (albeit a very small one).
Please tell me which of the options is better, how much better and why? What are the objective pros and cons of each approach? How slowly will primitive regular expressions slow down the system and / or how much better / worse are they than options with an additional field?
PS I understand that you can "take it and check it", but having no idea how and why the database will behave depending on the situations and not being able to simulate such situations on different hardware with a different set (volume) of data - I wanted to to hear the opinion of a person who understands how it works logically.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

sim3x, 2016-11-28
@sim3x

Third_normal_form
So you don't know what set of fields you actually have, you
don't know which requests prevail in you, you
don't know the set of field types
...
it's not clear what "better", "fast" and so on means

D

Denis Smirnov, 2016-11-29
@darthunix

It seems to me that all the problems come from the fact that you store completely different entities in one column. Until you put them in different places, you will suffer. And even regular expressions / flags / attempts to save numbers separately from strings will not solve the problem 100% - you yourself gave an example when it is not clear whether the amount is in a string or a phone number. You have something wrong in the storage scheme itself .... Only you know the subject area, so it's up to you to decide what and where)
If these are dynamic attributes, then it might be worth looking towards jsonb flooded with gin. But you need to understand in more detail what kind of analytics will be over these fields and why numbers are important.

A

Alexander Shelemetiev, 2016-12-01
@zoroda

I will join Denis Smirnov . In your case, perhaps the best solution would be JSONB. Of the minuses - data denormalization. If you need referential integrity control inside JSONB data, then the task will become much more complicated. It can be solved, for example, by hanging triggers with control functions.
If this control can be neglected, then I recommend taking a closer look at JSONB.