How can the script be optimized?

M

Mikhail Osher2012-06-27 15:29:24

Python

Mikhail Osher, 2012-06-27 15:29:24

I needed to download the whole album from VK. 120+ photos, too lazy. I installed a plugin for Chrome, got a list of links to photos, put them in a text file.
I think, let me write a PHP script that will do everything for me. But then I remember that I want to learn Python, so without thinking twice I decided on the tool. Remembering the recently read Dive into Python and googling a couple of questions regarding the jump itself, I wrote the following code.

# Imports
import urllib
import os

# Initialize downloader
web = urllib.URLopener()

# Path/files
cwd = os.getcwd()
urls = os.path.join(cwd, 'data.txt')

# Read the file
sock = open(urls)
data = [item.strip() for item in sock.readlines()]
sock.close()

# Download files
for url in data:
    # Get the filename
    basename = os.path.basename(url)
    
    # Destination..
    dest = os.path.join(cwd, 'temp', basename)

    # Process download
    web.retrieve(url, dest)

    # Print we are done
    print 'Done %s' % dest

The question is: what could be done better? Maybe somewhere it could be easier?
I already see that it is not necessary (or is it necessary?) to specify the absolute path to the files, but this is my habit with PHP.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

A

avalak, 2012-06-27
@miraage

> The question is: what could be done better? Maybe somewhere it could be easier?
Use wget.
-i, --input-file=FILE download URLs found in local or external FILE.

D

dronnix, 2012-06-27
@dronnix

For training, you can do exception handling so that the script does not crash on the first broken link, but throws an error in stderr

T

tgz, 2012-06-28
@tgz

If the goal is to practice programming, then you can rewrite it for parallel downloading.
In general, look towards asyncore or greenlets.

L

LightSUN, 2012-06-27
@LightSUN

Any rocking chairs (for example Download Master) can take a list of files for downloading directly from the buffer. Specifically DM (the current it is now) can also import a list of addresses for downloading from a file.