A
A
Alexander2015-07-24 10:50:50
MySQL
Alexander, 2015-07-24 10:50:50

Sets of objects in one request (insert/update)?

There is a shop. There are goods coming in a CSV file.
Now I'm iterating over the file line by line. string = item.
If the product is not in the database, then I add it. If there is, then I update the price.
As a result, I get a bunch of requests to add, update. Although, in fact, the data is static and can be stored in a list, and then massively fed into the database with one request. How to do this using the ORM dzhangi methods?
Goal:
Fewer database calls, faster file processing, less hardware load
Now:

file_url = './../file/items.csv'
with open(file_url, 'rb') as csvfile: # перебираем строки
    spamreader = csv.reader(csvfile, delimiter='|', quoting=csv.QUOTE_MINIMAL)
    for row in spamreader: # перебираем ячейки в строке
        id_item = row['item'].replace("'", "")
        item = Item.objects.filter(id_item=id_item)
        if not item:
            item = Item(id_item=id_item)
            item.name = row['name'].replace("'", "")
            ....
        else:
            item = item[0]
            item.price = row['price'].replace("'", "")
        item.save()

As you can see, running through the file, each product makes a request to the database to receive the product, save / update.
Part of my logic, as I see it
. For example, the check can be removed by pulling out a list of all products in advance
all_item = Item.objects.all().values_list('id_item', flat=True)

And then already create two lists where to save the goods beforehand, running through the file. Stupidly checking if there is a value in all_item
file_url = './../file/items.csv'
add_item = []
upd_item = []
with open(file_url, 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter='|', quoting=csv.QUOTE_MINIMAL)
    for row in spamreader:
        id_item = row['id_item'].replace("'", "")
        if id_item in all_item: # Та самая проверка
            upd_item.append({'id_item': id_item, 'price': row['price'].replace("'", "")})
        else:
            add_item.append({'id_item': id_item, 'name': row['name'].replace("'", "")})
# Вот тут уже то самое добавление / обновление
# Item(add_item).save() or Item(upd_item).save()

I'm interested in exactly how to "correctly" collect these lists and how to feed them to the database with one / two requests. All methods of ORM dzhangi, and not with the help of direct queries to the database.
Z.Y. About 10-40k goods per day come to the store for addition / change. Sometimes, there are under 100k items when a new supplier is added. In the store for a couple of months in the database can hang under 1k goods.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
sim3x, 2015-07-24
@syschel

Goal:
Fewer database calls, faster file processing, less hardware load
on the contrary - you need to transfer a maximum to a DB let itself
resolves In postgres there is a transaction mechanism. Roughly speaking, inside a transaction, a commit takes place only after your clear command.
What gives: the index is not rebuilt while you are inside the transaction, your data is not visible to requests outside the transaction (when you do a rollback and nothing happened)
You do https://docs.djangoproject.com/en/1.8/topics/ db/tr...
@transaction.atomic
def do_stuff():
    # This code executes inside a transaction.

and in the function you do get_or_create
Make a list of 40k positions is not worth it - work with each line-product separately
40k selects >>> a list of 40k Move
all checks and cleaning data from csv into separate functions so that your main function looks just like a set of calls
I strongly advise you to add to the model through ModelForm with validation - you never know what kind of csv you will be given
Do not use eval and / or the execution of any of these files
Clean the descriptions and titles from js and in general everything that you can accidentally show in the template without filtering aka {{ foo|safe }}
Put code in managed commands
use profiler and time ./manage.py do_stuff

S
some1else, 2015-07-30
@some1else

About csv parsing: it's embarrassing to pull replace manually - maybe you need to specify quotechar, or write your own Dialect?
As for create and update, jung now has a wonderful update_or_create .
About your decision - be sure to wrap all_item in the simplest set so that the entry is searched instantly, and not linearly from the number of products! In your case it's just ValuesQuerySet -> list!

R
Roman Kitaev, 2015-07-24
@deliro

transaction.

O
Oscar Django, 2015-07-24
@winordie

As I see it:
1) With one request you get goods from the database
2) You parse the price
3) You compare the lists obtained in steps 1 and 2
4) Based on them, you form lists for adding and updating
5) Using bulk_create or create, you create goods using update you update
6) ...
7) Profit
That is, in fact, as you wrote (if I understand correctly)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question