Howdy!
Today I'm debugging a super hard to track down issue when using Docker + Tailscale sidecar containers, and I'm stumped about how could I fix this issue.
I have a application running within a Docker container, I also have a Tailscale sidecar container running, and the application is ran with network_mode: service:tailscale
, so the application shares the same network stack as the sidecar container. This makes it easy to manage all of your container connections, if you are running containers on multiple machines.
However I've noticed something weird happening recently with one of my applications: When I start the service with docker compose up
, 90% of the time the application gets stuck trying to connect to the database, and, in 10% of the time, the application throws a Failed to initialize pool: This connection has been closed.
error.
I was devastated, this is such a hard bug to debug. I did notice that running the container without network_mode: service:tailscale
does work, but then it is using the Tailscale from the host itself, so I can't access the container services from another Tailscale connected machine by their hostname.
I ended up noticed that using docker compose up tailscale
and THEN docker compose up servicenamehere
DOES work, so it seems like it is a race condition of some sorts, where when the Tailscale service isn't fully started up yet, this bug happens.
This also makes it more clear about why this could be happening: This application is running on a server in Brazil, so it takes a bit longer to setup the Tailscale connection compared to my other apps running at OVHcloud Canada.
So yeah, now I'm lost:
I could attempt to implement a way to check if Tailscale is up and running before attempting to connect to anything in my apps, but that feels hacky. (Remember that sometimes the connection attempt just gets stuck, so it isn't as easy as "just retry if the connection fails lol")
Another solution would be to force Docker to not be able to use the host's Tailscale setup, so in this case attempting to connect the database would fail if Tailscale is not up yet, however I couldn't figure out a way to do this.
This post is mostly as a "rubber duck debugging" for me, and maybe this could also help anyone that is facing the same issue and can't figure out why it is happening (even tho I haven't found the solution for it... yet)