May I suggest CRIU (https://criu.org/) as a possible solution to problem of long-running processes?
Yes, it's not exactly production ready, but theoretically it could solve all problems related to 'long running' tasks, by allowing PA to move processes to other machines before a restart, and by suspending idle processes until they are accessed again.
The later would be helpful when giving people a way to run their own services (db, task queues, etc.)