J
J
Johnem2020-08-19 17:39:07
Python
Johnem, 2020-08-19 17:39:07

How to optimize such code?

There is a code (I don’t attach the database itself because it weighs 666mb), its essence is simple - it goes through the lines of the database and looks for extremes, but the trouble is, it does it extremely slowly. Can it be optimized? Or is it better to use pandas (for processing and possibly storing records (in this case, will there be a big loss in file weight and by how much?)) or in general numpy structural arrays and store them in .npy (if possible) or in some kind of csv. Any advice would be helpful! Code below.

import sqlite3

connect = sqlite3.connect('exemple.db')
cursor = connect.cursor()
cursor.executescript("""
    CREATE TABLE IF NOT EXISTS historical_data (
        security_id TEXT,
        date_and_time TEXT,
        price REAL
    );
    CREATE TABLE IF NOT EXISTS extremes (
        security_id TEXT,
        date_and_time TEXT,
        price REAL
    );
    CREATE TABLE IF NOT EXISTS levels (
        security_id TEXT,
        price REAL,
        power TEXT
    );
    """)

security_id = "GAZP"
date_and_time = "2020-01-06 10:00"

cursor.execute("SELECT COUNT(*) FROM historical_data WHERE security_id = ? AND date_and_time LIKE ?", (security_id, f"%{date_and_time}%"))
for i in range(cursor.fetchall()[0][0]):
    cursor.execute("SELECT * FROM historical_data WHERE security_id = ? AND date_and_time LIKE ? ORDER BY date_and_time LIMIT ?, 2", (security_id, f"%{date_and_time}%", i))
    a = cursor.fetchall()
    if a[1][2] - a[0][2] == 0:
        print("EXTR!")
        print(a)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Ruslan., 2020-08-19
@Johnem

You can try to reduce the search by discarding obviously non-repeating prices, for example:

SELECT price
 FROM historical_data
 WHERE security_id = ?
   AND date_and_time LIKE? 
 GROUP BY price
HAVING COUNT(price) > 1

At the output, we get prices that occur at least twice for a paper and a date. And then knowing the paper, date and price, you can check whether these prices go consistently or not.

D
d-stream, 2020-08-19
@d-stream

I strongly suspect that one pure sql query that fills everything at once will be orders of magnitude more optimal.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question