How to prevent a script from executing if another copy of it is already running?

L

Lelouch2020-04-13 16:13:39

linux

Lelouch, 2020-04-13 16:13:39

I wrote a Python script that searches for CSV files in a certain folder and if it finds them, it performs some transformations on the data, and then loads them into MySQL, from where I can already take the data to BI and work with it.

Now I want to automate the process a little and, having finalized the script, add it to crontab, so that by simply dropping the files into the desired folder on my own (or with some other script), I was sure that everything would be processed normally and the data would get into the database.

But, I don't really understand how exactly Python and Linux work with files. The files can be very large and the script can run for up to tens of minutes. At the same time, I would not want to put execution in crontab every few hours.

I plan to make this script logic:
1. We look at how many files are in the folder and take the first one.
2. We break it into parts and take the first part into the pandas dataframe.
3. We delete this part from the file (if the lines are over, we delete the file itself).
4. We perform the necessary manipulations with the data and send them to the database.
5. exit()

And accordingly we put all this miracle in crontab to run every minute.

The question is, will it all work fine, or do I need to do something differently? Are the following situations possible with this logic and how best to avoid them:
1. Crontab launched the first copy of the script, it is being executed, and at the time of writing the updated version of the file, from which the part of the data that we are currently working with has been removed, crontab launches the second copy of the script, which will take the incomplete file?
2. For some reason, the server will be loaded and the script will not have time to process the expected volume in a minute, as a result, in a couple of hours I will get about a hundred simultaneously working copies that everyone will hang up?

Reply

Answer the question

In order to leave comments, you need to log in

6 answer(s)

A

Adamos, 2020-04-13
@Lelouch

If the /tmp/your_script.lock file exists, exit.
Create file /tmp/your_script.lock
Do the job.
Delete file /tmp/your_script.lock Catch
: if your script crashes before reaching the last point, it won't run again. So on point 1 it is worth considering - "if the file exists and is younger than 10 minutes", for example.

S

Sergey Gornostaev, 2020-04-13
@sergey-gornostaev

On startup, check for the presence of a pid file in /var/run. If the file exists, exit immediately. If not, then create, register an atexit handler to remove it, and do the main work.

M

Melkij, 2020-04-13
@melkij

flock -n путь_к_lock_файлу команда_запуска_вашего_скрипта

That there is a variation of other answers posted here, but if it fails for some reason, the script will still run the next time and will not wait until you yourself delete the remaining lock file.

N

Noname, 2020-04-13
@Ddeeeennn

Alternatively, create a write handle for the file. If the first script created, then the second can no longer.

A

Alexey Sundukov, 2020-04-13
@alekciy

The most normal guaranteed option is to use the mutex through the OS IPC semaphore (for linux).

S

Saboteur, 2020-04-13
@saboteur_kiev

The standard solution in Linux is to create a PIDFILE with the number of the running process.
When the script is launched, the file is checked, it is checked that the process indicated in it is running. If it is running, it means to complete the work so as not to interfere with the one already running.
If it is not running, first of all create a PIDFILE.
At the end of the script, delete the PIDFILE.
You can search for a ready-made library in python, see how to work with it
import pidfile