Skip to main content btrfs scrub does not finish for one device (but everything is scrubbed) : r/btrfs
Go to btrfs

btrfs scrub does not finish for one device (but everything is scrubbed)

I ran my monthly btrfs scrub overnight - RAID1 array across 3 disks of 8 TB each - generally takes all night to run. I had to interrupt it briefly to copy over some files and then resumed it as I often do.

This morning I check the status and check dmesg to see if anything crapped out. btrfs scrub status tells me it scrubbed just about everything and found no errors, but the dmesg output is strange;

[1321461.097501] BTRFS info (device sdb1): scrub: started on devid 1
[1321461.097912] BTRFS info (device sdb1): scrub: started on devid 2
[1321461.097915] BTRFS info (device sdb1): scrub: started on devid 3
[1357979.019433] BTRFS info (device sdb1): scrub: finished on devid 1 with status: 0
[1359053.862388] BTRFS info (device sdb1): scrub: finished on devid 2 with status: 0

And that's it. In other words; scrub finished on devid 1 and 2, but not on 3. If I run ps a | grep scrub it shows me the resume is still running;

 7544 pts/1    Sl    68:44 btrfs scrub resume /srv/dev-disk-by-label-d1/

The "running for" timestamp in btrfs scrub status no longer updates, so it seems to be finished... but there's this process still running and a missing finish status for devid 3.

I've never seen this before. Does anyone know what could cause this and how to resolve it? I don't want to blindly kill the scrub and I'd prefer not having to run it again.

Sort by:
Best
Open comment sort options

Can you post what kernel and btrfs progs version you're using along with the distro? I personally think this should be posted on the mailing list with all the details you have as this certainly does not seem to be the behavior you should be experiencing, even if a disk is failing.

Scrub does indeed spawn a thread for each disk, but it shouldn't indicate scrub is finished until every device is finished. I wonder if this behavior can be reproducible.

If you've never mailed the list before, check out this for details: https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list

Post back the subject too for others to reference (and so I can personally follow it lol).

scrub status did not actually show it as finished and scrub status -d showed that it was still working on one of the disks. The issue here is that, for some reason I'm trying to figure out, the scrub of the last disk took many hours longer than the previous ones.

Maybe I did not phrase the initial post as well as I could have, so just for clarity; the scrub status output did NOT say it was finished, but it was hanging and the timestamp was no longer increasing. Scrub status -d did show an increasing timestamp on disk 3. dmesg showed 2 disks finished. The last one eventually finished many hours later and the overview went to show everything was finished.

Sorry if that was confusing in my original post.

More replies

What does a btrfs scrub status -d give you? Does it change over time?

Interesting, it DOES change over time for devid 3. I guess that means it's still running for that device? But that means it's been running for many hours longer than devid 1 and 2, which was never the case before.

Am I looking at a potential hardware disk failure here?

More replies
More replies

Along with scrubbing, checking SMART should be a regular thing. smartmontools smartctl can show you if that drive is having issues.

After the scrub, it might be worth running a smart test on it/them.

Thanks for pointing that out, but that's already something I'm doing regularly :)

More replies