Skip to main content Weird Tailscale issue when using Tailscale within Docker containers : r/Tailscale
r/Tailscale icon
Go to Tailscale
•

Weird Tailscale issue when using Tailscale within Docker containers

Help Needed

Howdy!

Today I'm debugging a super hard to track down issue when using Docker + Tailscale sidecar containers, and I'm stumped about how could I fix this issue.

I have a application running within a Docker container, I also have a Tailscale sidecar container running, and the application is ran with network_mode: service:tailscale, so the application shares the same network stack as the sidecar container. This makes it easy to manage all of your container connections, if you are running containers on multiple machines.

However I've noticed something weird happening recently with one of my applications: When I start the service with docker compose up, 90% of the time the application gets stuck trying to connect to the database, and, in 10% of the time, the application throws a Failed to initialize pool: This connection has been closed. error.

I was devastated, this is such a hard bug to debug. I did notice that running the container without network_mode: service:tailscale does work, but then it is using the Tailscale from the host itself, so I can't access the container services from another Tailscale connected machine by their hostname.

I ended up noticed that using docker compose up tailscale and THEN docker compose up servicenamehere DOES work, so it seems like it is a race condition of some sorts, where when the Tailscale service isn't fully started up yet, this bug happens.

This also makes it more clear about why this could be happening: This application is running on a server in Brazil, so it takes a bit longer to setup the Tailscale connection compared to my other apps running at OVHcloud Canada.

So yeah, now I'm lost:

  • I could attempt to implement a way to check if Tailscale is up and running before attempting to connect to anything in my apps, but that feels hacky. (Remember that sometimes the connection attempt just gets stuck, so it isn't as easy as "just retry if the connection fails lol")

  • Another solution would be to force Docker to not be able to use the host's Tailscale setup, so in this case attempting to connect the database would fail if Tailscale is not up yet, however I couldn't figure out a way to do this.

This post is mostly as a "rubber duck debugging" for me, and maybe this could also help anyone that is facing the same issue and can't figure out why it is happening (even tho I haven't found the solution for it... yet)

Slash review times, eliminate bugs, and get valuable insights into your projects and code.
Sort by:
Best
Open comment sort options

https://docs.docker.com/compose/startup-order/

Have you tried 1) creating a health check for the TS container, and 2) using depends_on with the service_healthy directive on the other container?

• • Edited

Hmm I need to try that out later, but I think that wouldn't work since Tailscale's container doesn't seem to have a health check set up https://github.com/tailscale/tailscale/blob/main/Dockerfile

But maybe I can add it myself https://github.com/peter-evans/docker-compose-healthcheck I wonder if tailscale status has a different exit code if TS is not connected yet

More replies
More replies
[deleted]
•

Comment deleted by user

While that could work, I don't see anything that could help me to check if Tailscale is 100% connected, remember that, by default, the container ends up using the host's Tailscale network UNTIL the container's Tailscale network ends up connecting.

And I think that's why it is breaking: The reason why 90% of the times the connection just fails is because Tailscale ends up connecting in the middle of the database startup, which causes the packets to never be sent to the correct host, which causes the application to hang.

However I'm not sure if it possible to not use the system's Tailscale network, because if I bind /dev/net/tun:/dev/net/tun, then the container uses the host's Tailscale network until Tailscale finishes the tailscale up process.

If it is was possible to not use it, then my issues would be fixed. Yeah, the app would fail to connect to the database because Tailscale isn't ready yet, but I can just retry the connection in the app level.

I discovered for a different "issue" with Tailscale which it takes about 120 seconds in order to establish the connection at boot. In docker exist for dockerfile the healhtcheck CMD where i possible to set a delay (default is 0).