Answer the question
In order to leave comments, you need to log in
Is there an equivalent to the shelve module that works like a list rather than a dictionary?
Greetings.
I have a list with a very large number of elements. So big that it doesn't fit in RAM. In such conditions, the speed of work is no longer so critical, as long as it doesn’t fall out over OOM, so you can throw all this stuff on the disk and process it there.
The shelve included in the standard library is very good, but the problem is that it works like a dictionary (that is, it does not store order), and does not accept an int as a key (that is, it will not be so easy to simulate a list).
The question is, are there ready-made solutions that can help me? I'm thinking about a wrapper around sqlite, but so far only very crutch solutions come to mind.
By the way, by "as a list" I mean iterable, as well as the append() and pop() methods.
Answer the question
In order to leave comments, you need to log in
It all depends on what exactly you store in this list, so I will give a few options.
1. numpy allows you to create typed arrays and stores them in memory, but in a very compact form (the volume is sometimes several times smaller than that of a regular python's list) and provides very rich data processing capabilities.
2. pandas , using numpy, can create structured arrays (dataframes) similar to database tables, and provides advanced functionality for selecting and processing this data.
3. pytables allows you to save numpy arrays or pandas dataframes to disk as HDF5 files, providing quick access to data and, again, convenient data search / retrieval functionality.
Most likely, on your data volumes, pandas + HDF5 will be many times / tens of times faster than any DBMS.
4. bcolz allows you to compress data and store it not only in memory but also on disk.
At the same time, data operations are performed very quickly, sometimes even faster than with uncompressed lists. numba , with which you can speed up python's cycles tens to hundreds of times .
In general, list-like data is much faster to process using vector operations in numpy and pandas. But if you still need cycles, then I also recommend paying attention to
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question