@Cartroo, if I understand you correctly my app may have several instances sharing same file system,
but not memory.
Yes - all the instances running on PA share the same underlying filesystem, even if they're running on different servers. So, anything you store in a file will be available to all instances.
It's possible that some of your instances may be running as threads in the same process, and this means they will share memory. So it's possible to write your application to use memory and it will appear to work. However, at a later time it's possible for some instances to run in a different process, or on a completely different server - that's just the way things are when you're running in the cloud. So, you can't assume that you can share memory between instances even if it sometimes seems to work.
I see two options, defining a upper limit of instances (no need to have that much instances) or
somehow put a flag on the file download first within 4 hours and let all other instances use that file
instead of hammering the ftp server. Do you agree?
Limiting the number of instances might help somewhat, but doesn't solve the underlying problem. Let's say you limit the instances to 1, so you can use in-memory storage. What will probably happen is that your instance will go idle for some time, and maybe the system will terminate that process and start it up somewhere else. At that point your memory storage has disappeared.
The second option you mention, to store the data on the filesystem, is definitely the best option in this case, I think. It also shouldn't be too difficult - Python makes it quite easy to do filesystem access. You could easily do something like this:
import fcntl
import os
import time
import urllib2
class FileWithLock(object):
def __init__(self, filename, mode):
self.filename = filename
self.mode = mode
self.fd = None
def __enter__(self):
self.fd = open(self.filename, self.mode)
lock_mode = fcntl.LOCK_EX if self.mode[0] == "w" else fcntl.LOCK_SH
fcntl.lockf(self.fd, lock_mode)
return self.fd
def __exit__(self, t, val, tb):
fcntl.lockf(self.fd, fcntl.LOCK_UN)
self.fd.close()
self.fd = None
return False
def get_data():
data_file = os.path.expanduser("~/my_cache_file")
data = None
try:
info = os.stat(data_file)
except OSError:
info = None
if info is None or time.time() - info.st_mtime > 3600 * 4:
with FileWithLock(data_file, "w") as fd:
data = urllib2.urlopen("http://www.example.com/data").read()
fd.write(data)
else:
with FileWithLock(data_file, "r") as fd:
data = fd.read()
Now I'm not trying to say this code is brilliant (hard-coded URLs and filenames, no proper exception handling, etc.), but it's just an example to get you going. Note the use of file locking to make sure that concurrent requests can't corrupt the cached file. Some might say this is a little paranoid, but I don't like to assume that the Python write()
call maps to an atomic underlying operation, especially on virtualised filesystems like on PA. If something's worth doing, it's worth doing properly!
All that said, you might find it easier to look into httplib2, which is already installed on PA; or a combination of requests, already on PA, and requests-cache, which you'd need to install yourself. Either of these solutions will do transparent caching for you so you don't need to worry about the details.
As an aside, since PA already has requests, perhaps it might be useful to install requests-cache as well?