Making a script that downloads a bunch of binary data file from AWS so I can start investigating the data in an ipython notebook. Trying to use concurrent.futures.ThreadPoolExecutor to download all the files concurrently because I'll be downloading ~200 of them, and I want to go fast. I've tested the non-concurrent parts of the code and they work just fine.
import requests
import os
from concurrent import futures
def download_bin(link):
print('Beginning download of {}'.format(link))
base, fname = os.path.split(link)
new_fn = './bins/' + fname
r = requests.get(link)
with open(new_fn, 'wb+') as fh:
fh.write(r.content)
print('Done downloading {}'.format(link))
def download_files(links):
print('\nDownloading binary data...')
from pprint import pprint
pprint(links)
workers = min(20, len(links))
with futures.ThreadPoolExecutor(workers) as executor:
executor.map(download_bin, links)
if __name__ == "__main__":
links = [...]
download_files(links)
print('done..?')
The program calls download_files
. I see it print "Downloading binary data..", then I see it pprint all the links, but it never says "Beginning download of X". Instead, the program continues until its end with no errors as if the with
block didn't exist. What's going on here?
edit1: added __name__ == "__main__"
section to show how the code is run
edit2: added relevant imports