Running concurrent file downloads? : Forums : PythonAnywhere

Running concurrent file downloads?

Making a script that downloads a bunch of binary data file from AWS so I can start investigating the data in an ipython notebook. Trying to use concurrent.futures.ThreadPoolExecutor to download all the files concurrently because I'll be downloading ~200 of them, and I want to go fast. I've tested the non-concurrent parts of the code and they work just fine.

import requests
import os
from concurrent import futures

def download_bin(link):
    print('Beginning download of {}'.format(link))
    base, fname = os.path.split(link)
    new_fn = './bins/' + fname
    r = requests.get(link)
    with open(new_fn, 'wb+') as fh:
        fh.write(r.content)
    print('Done downloading {}'.format(link))

def download_files(links):
    print('\nDownloading binary data...')
    from pprint import pprint
    pprint(links)
    workers = min(20, len(links))
    with futures.ThreadPoolExecutor(workers) as executor:
        executor.map(download_bin, links)

if __name__ == "__main__":
    links = [...]
    download_files(links)
    print('done..?')

The program calls download_files. I see it print "Downloading binary data..", then I see it pprint all the links, but it never says "Beginning download of X". Instead, the program continues until its end with no errors as if the with block didn't exist. What's going on here?

edit1: added __name__ == "__main__" section to show how the code is run

edit2: added relevant imports

paularcoleo | 17 posts | May 11, 2017, 8:23 p.m. | permalink

Hi,

if you can use Python 3.5+ I would recommend using aiohttp instead of requests. It's blazingly fast and together with the new async/await syntax the code is actually readable/understandable. Here is a nice blog article about this: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

Cheers, Oliver

deleted-user-1335304 | 6 posts | May 11, 2017, 8:54 p.m. | permalink

Error most likely due to the fact that it was using python 2.7 instead of 3.4. But the fact that it wasn't giving me any errors seemed odd.

paularcoleo | 17 posts | May 11, 2017, 8:58 p.m. | permalink

That does sound odd. What happens if you do

print(executor.map(download_bin, links))

...? I'm wondering if the map is returning some kind of iterator that only triggers execution of the code when it's examined.

giles | 12671 posts | PythonAnywhere staff | May 12, 2017, 12:53 p.m. | permalink

¯\_(ツ)_/¯

still no code execution

paularcoleo | 17 posts | May 12, 2017, 1:15 p.m. | permalink

No, that's not very helpful, is it ;-)

How about explicitly iterating over it, eg.

[_ for _ in executor.map(download_bin, links)]

giles | 12671 posts | PythonAnywhere staff | May 12, 2017, 1:20 p.m. | permalink

That seemed to do it. Maybe in the 2.7 implementation you have to call it explicitly like that. Thanks for the help!

paularcoleo | 17 posts | May 12, 2017, 1:25 p.m. | permalink

Excellent! Glad we could work it out.

giles | 12671 posts | PythonAnywhere staff | May 12, 2017, 1:38 p.m. | permalink