Forums

Web App Down

My app is down again. Server logs says Harikiri on worker 2 and then when i reloaded the webapp it said bad file descriptor Now its not giving any log on reloading as well Every request is getting timed out. On every maintenance i have faced this, everytime its a lock that get stuck, i even tried deleting and then reuploading sqlite db files and the issue persists. Its a paid application and is used for production 24*7.

Can someone look into this urgently please

It looks like we found the problem and it should work now. Sorry for that.

Can you explain what was the problem. Is there something we can take care to avoid this. We keep receiving requests every second and we can't afford such long downtime

There is a sporadic issue where SQLite databases get locked and don't unlock until we take action on our side. We've put some alerting in place today so that we'll be able to identify issues like this quickly in the future, and should be able to add in -- over the next few days -- a system to unlock everything automatically within a minute or so of the issue occurring in the future.

BTW I would recommend strongly against using SQLite on a production service on PythonAnywhere. We use a networked file system so that your code can access your databases no matter where it is running on our system, and -- for all its many strengths -- SQLite does not handle networked filesystems well. Even when this locking issue is fixed, you'll get worse performance, especially as your site scales in terms of numbers of users and the amount of data, with SQLite. MySQL or Postgres would be better.

So I have faced this same problem 4-5 times now, every time when there is a maintenance and out of nowhere where I am unable to do anything unless i get a response from your side. I know you guys are using AWS but I don't know what exactly is going wrong. Is there a bad practice followed in the code of the app. Even if there, there should be handling that rebooting will have data from the disk but with no lock, can you give me some more insight on the issue. If anything that can be fixed on anyone's end to make the experience much more reliable. This is production and sqlite fits my use case perfectly thats why I don't want to migrate

We've already deployed the system that will automatically reboot the locking service if it breaks, so we don't expect this issue to occur in the future -- or rather, if it does, it will be fixed without either you or us having to do anything within a minute, so you're unlikely to notice any downtime.

But, do we have a reason on why the locks aren't released

Yes, the server that manages the locks on the file server doesn't handle abrupt disconnections well, so if one of our other servers has an issue or does a hard disconnect, it can leave locks "dangling" even after we have fixed the server in question.

Does this means that if there is a problem on some other server that might affect my server as well?

Yes, but the problem shouldn't be appearing anymore.