Starting Containers in Specific Order after Reboot?
I mean there are a billion ways to do things, and the “right way” is subjective and is affected by budgets and time and skill and status quo and the current drink special at the pub, but...
Shouldn’t your services be able to survive without or wait until their dependencies are ready?
I’m not at your scale, but if I were to get there, a problem like this would drive me nuts but a solution that requires orchestrating the boot order would make me feel icky.
Fully agree. I went to the pub after I wrote this to unpack the problem. Normally, I would build it into the image to check to see if the remote host was available, but the problem occurs outside of the containerspace when starting all the containers (It's docker-compose looking for the internal 172.xx IP we expose). I'm certain I can put together some ridiculous 200 line bash script to capture all running containers before a reboot and then start them up in order putting that first container up before anything else but thought I'd bring this to a larger group to see if others have encountered similar situation. My Sysadmin struggled today with an hour outage trying to diagnose what went wrong (odd, after 1 year of having this infrastructure) , and I'm just the Director wanting to make their life easier.
Id imagine if usinf linux docker-compose up (name of container) && docker-compose up -d
the restart policy seems only respect the linked container order, which is deceprated. But, in any way, you should not rely on container start order. As the container start order doesn't necessary imply the process start order inside container.
Our docker-compose dictates restart:always so they do actually restart properly, with the exception of some of the containers that have this external dependency of another container starting up before they start. I will have to test up a test box with a hundred or so containers and start the tinc container in question and give it a handful of host reboots to see if it always starts first if I start tinc before starting anymore of my containers.
There is also external_links which I've found works outside of the docker-compose where as links (deprecated only knows what's listed in that container stacks YaML. If it works, I may have answered my own question and report back.
Another architectural option that comes to mind - you didn't mention the type of communications happening over the VPN, but sounds like a good use case for some sort of store-and-forward message queue, instead of having containers connect to each other. Not the direct answer you're looking for, but something to consider
Basically you start each container as systemd service and define the dependencies there. You also leave container restarting to systemd.
You can skip the Ansible part or adapt to whatever automation you're using. (or just generate the systemd unit files via script)
There are probably better ways to do that right now, but this might be very straightforward from what you currently have in place.
If using docker-compose dependencies to determine the proper startup order is not applicable, you can try to use a little script called 'wait-for' to abort startup, if a certain host and port are not yet open.
Since you've your docker-compose policy set to 'restart: always', the container should exit and restart automatically and only succeed starting up, once your VPN container is up as well, because otherwise it would always run into a timeout when trying to connect to IP:PORT.
Bear in mind, that this script has most likely been developed for servers that depend on a database container, at least that's what I'm using such a script for. It has been working quite okay for me, but you might run into scaling issues, because many of your containers could end up in a boot loop.
This is probably not the best way to do what you're trying to achieve, but a possible solution.
For clarity:
What happens when the VPN container dies and a new one is spun up?
Does this require all existing containers restart as well? Can they cope with the outage or is your VPN setup in a redundant way?
I mean there are a billion ways to do things, and the “right way” is subjective and is affected by budgets and time and skill and status quo and the current drink special at the pub, but...
Shouldn’t your services be able to survive without or wait until their dependencies are ready?
I’m not at your scale, but if I were to get there, a problem like this would drive me nuts but a solution that requires orchestrating the boot order would make me feel icky.
Fully agree. I went to the pub after I wrote this to unpack the problem. Normally, I would build it into the image to check to see if the remote host was available, but the problem occurs outside of the containerspace when starting all the containers (It's docker-compose looking for the internal 172.xx IP we expose). I'm certain I can put together some ridiculous 200 line bash script to capture all running containers before a reboot and then start them up in order putting that first container up before anything else but thought I'd bring this to a larger group to see if others have encountered similar situation. My Sysadmin struggled today with an hour outage trying to diagnose what went wrong (odd, after 1 year of having this infrastructure) , and I'm just the Director wanting to make their life easier.
More replies More replies
Id imagine if usinf linux docker-compose up (name of container) && docker-compose up -d
the restart policy seems only respect the linked container order, which is deceprated. But, in any way, you should not rely on container start order. As the container start order doesn't necessary imply the process start order inside container.
Our docker-compose dictates
restart:always
so they do actually restart properly, with the exception of some of the containers that have this external dependency of another container starting up before they start. I will have to test up a test box with a hundred or so containers and start the tinc container in question and give it a handful of host reboots to see if it always starts first if I start tinc before starting anymore of my containers.There is also
external_links
which I've found works outside of the docker-compose where aslinks
(deprecated only knows what's listed in that container stacks YaML. If it works, I may have answered my own question and report back.Another architectural option that comes to mind - you didn't mention the type of communications happening over the VPN, but sounds like a good use case for some sort of store-and-forward message queue, instead of having containers connect to each other. Not the direct answer you're looking for, but something to consider
I've did solve the problem with systemd some time ago.
Basically you start each container as systemd service and define the dependencies there. You also leave container restarting to systemd.
You can skip the Ansible part or adapt to whatever automation you're using. (or just generate the systemd unit files via script)
There are probably better ways to do that right now, but this might be very straightforward from what you currently have in place.
If using docker-compose dependencies to determine the proper startup order is not applicable, you can try to use a little script called 'wait-for' to abort startup, if a certain host and port are not yet open.
Since you've your docker-compose policy set to 'restart: always', the container should exit and restart automatically and only succeed starting up, once your VPN container is up as well, because otherwise it would always run into a timeout when trying to connect to IP:PORT.
You can find a script for this purpose on GitHub: https://github.com/eficode/wait-for
Bear in mind, that this script has most likely been developed for servers that depend on a database container, at least that's what I'm using such a script for. It has been working quite okay for me, but you might run into scaling issues, because many of your containers could end up in a boot loop.
This is probably not the best way to do what you're trying to achieve, but a possible solution.
For clarity:
What happens when the VPN container dies and a new one is spun up?
Does this require all existing containers restart as well? Can they cope with the outage or is your VPN setup in a redundant way?
More replies