

The official Tailscale subreddit. Not routinely monitored by Tailscale employees. Please contact support via https://tailscale.com/contact/support if you need further help.
Weird Tailscale issue when using Tailscale within Docker containers
Howdy!
Today I'm debugging a super hard to track down issue when using Docker + Tailscale sidecar containers, and I'm stumped about how could I fix this issue.
I have a application running within a Docker container, I also have a Tailscale sidecar container running, and the application is ran with network_mode: service:tailscale
, so the application shares the same network stack as the sidecar container. This makes it easy to manage all of your container connections, if you are running containers on multiple machines.
However I've noticed something weird happening recently with one of my applications: When I start the service with docker compose up
, 90% of the time the application gets stuck trying to connect to the database, and, in 10% of the time, the application throws a Failed to initialize pool: This connection has been closed.
error.
I was devastated, this is such a hard bug to debug. I did notice that running the container without network_mode: service:tailscale
does work, but then it is using the Tailscale from the host itself, so I can't access the container services from another Tailscale connected machine by their hostname.
I ended up noticed that using docker compose up tailscale
and THEN docker compose up servicenamehere
DOES work, so it seems like it is a race condition of some sorts, where when the Tailscale service isn't fully started up yet, this bug happens.
This also makes it more clear about why this could be happening: This application is running on a server in Brazil, so it takes a bit longer to setup the Tailscale connection compared to my other apps running at OVHcloud Canada.
So yeah, now I'm lost:
-
I could attempt to implement a way to check if Tailscale is up and running before attempting to connect to anything in my apps, but that feels hacky. (Remember that sometimes the connection attempt just gets stuck, so it isn't as easy as "just retry if the connection fails lol")
-
Another solution would be to force Docker to not be able to use the host's Tailscale setup, so in this case attempting to connect the database would fail if Tailscale is not up yet, however I couldn't figure out a way to do this.
This post is mostly as a "rubber duck debugging" for me, and maybe this could also help anyone that is facing the same issue and can't figure out why it is happening (even tho I haven't found the solution for it... yet)
https://docs.docker.com/compose/startup-order/
Have you tried 1) creating a health check for the TS container, and 2) using
depends_on
with theservice_healthy
directive on the other container?Hmm I need to try that out later, but I think that wouldn't work since Tailscale's container doesn't seem to have a health check set up https://github.com/tailscale/tailscale/blob/main/Dockerfile
But maybe I can add it myself https://github.com/peter-evans/docker-compose-healthcheck I wonder if
tailscale status
has a different exit code if TS is not connected yetComment deleted by user
While that could work, I don't see anything that could help me to check if Tailscale is 100% connected, remember that, by default, the container ends up using the host's Tailscale network UNTIL the container's Tailscale network ends up connecting.
And I think that's why it is breaking: The reason why 90% of the times the connection just fails is because Tailscale ends up connecting in the middle of the database startup, which causes the packets to never be sent to the correct host, which causes the application to hang.
However I'm not sure if it possible to not use the system's Tailscale network, because if I bind
/dev/net/tun:/dev/net/tun
, then the container uses the host's Tailscale network until Tailscale finishes thetailscale up
process.If it is was possible to not use it, then my issues would be fixed. Yeah, the app would fail to connect to the database because Tailscale isn't ready yet, but I can just retry the connection in the app level.
I discovered for a different "issue" with Tailscale which it takes about 120 seconds in order to establish the connection at boot. In docker exist for dockerfile the healhtcheck CMD where i possible to set a delay (default is 0).