J
J
Joseph Goodman2020-06-28 09:45:47
Python
Joseph Goodman, 2020-06-28 09:45:47

How to optimize performance in JupyterNotebook?

Good day.
I'm taking a course on machine learning on Stepik and ran into a rather unpleasant problem.
During data preparation, the final dataframe has grown to an unprecedented size 5ef83bf62a9fa742889856.jpeg

. Is there any way to optimize the use of computer resources, otherwise it is impossible to work. When you start the cell, the computer just hangs. Even in order to display a message on the screenshot, the computer was pretty much stuck.

The computer has an old intel core i3 and 8gb ddr3. Or do you still need to change components?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Danil, 2020-06-28
@lolaevv

First of all, look at how the memory is being used: And then optimize the type of each column :
df.memory_usage(deep=True)

  • For categorical data:
    df['object'].astype('category')
  • user_id and days , for example, should be int type
  • Well, you probably don't need float64 either. A float32 is enough , and sometimes a float16

J
Joseph Goodman, 2020-06-28
@lolaevv

Changed the type of one of the columns from float64 to int via df.step_id.astype(int) and memory usage decreased from 4gb+ to 3.4gb . True, the computer was absolutely inoperable for 15 minutes, while operations were performed to change the column type

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question