File System and Btrfs – Advanced Features

Btrfs maintenance

One of the special features of btrfs is, that all the data is check-summed. The scrub job should be scheduled once a week, it reads all data on a HDD/SDD/VM, re-calculates the checksum and compares it to the previous calculated one. If it doesn’t match, the hard drive may fail soon and you can decide to replace it in time. Additional if you use e.g. RAID1 or 10 then btrfs can automatically correct the wrong copy via the correct copy from another disk.

Run check and balance from time to time as well, especially after adding/removing a drive in a btrfs RAID1/10. Balance can be a long running task if you have multiple terabytes of data.

Example scrub systemd service:

$ sudo nano /usr/local/bin/rf-scrub.sh
#!/bin/bash
btrfs scrub start / | systemd-cat
btrfs scrub status / | systemd-cat
$ sudo chmod 700 /usr/local/bin/rf-scrub.sh
$ sudo nano /etc/systemd/system/rf-scrub.service
[Unit]
Description=rf-scrub.service for Btrfs

[Service]   
Type=oneshot
ExecStart=/usr/local/bin/rf-scrub.sh
User=root  
Group=root
$ sudo nano /etc/systemd/system/rf-scrub.timer
[Unit]
Description=Run rf-scrub weekly, start immediatly if system was off

[Timer]
#OnCalendar=weekly
OnCalendar=Fri *-*-* 02:00:00
Persistent=true

[Install]
WantedBy=timers.target
$ sudo systemctl enable rf-scrub.timer 
&& sudo systemctl restart timers.target && systemctl list-timers

IO Scheduler (Desktop)

The default IO scheduler’s on Arch Linux offers now multi queue. The boot parameter “scsi_mod.use_blk_mq=1” is not required anymore. Your default IO schedule paramters for the specific disk (replace X) should look like:

$ cat /sys/block/sdX/queue/scheduler
[mq-deadline] kyber bfq none

The active one for the disk is marked in brackets.

For a SSD by default mq-deadline is activated and for a normal HDD bfq. For USB sticks bfq should be enabled manually as well, and for a HDD with SMR mq-deadline (see http://lkml.iu.edu/hypermail/linux/kernel/1810.0/03048.html, to avoid using the wrong scheduler especially for SMR devices and to improve USB performance there might be patches for bfq in Linux > 4.21).

Bfq in low latency mode can highly increase desctop interactivity and application startup times on NVME devices, read https://www.phoronix.com/scan.php?page=article&item=linux-420-io&num=1.

bfq

With bfq (https://wiki.archlinux.org/index.php/Improving_performance#Changing_I/O_scheduler) you can improve startup times, to check if low latency mode is enabled run.

cat /sys/block/sd*/queue/scheduler

You can enable bfq via a udev rule on startup.

/etc/udev/rules.d/60-ioschedulers.rules
# set scheduler for non-rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]|mmcblk[0-9]*|nvme[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"
# set scheduler for rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"

File System Feature Matrix

So far most of my machines are running with Btrfs as all data is check summed and it allows to recognize drive errors earlier and in RAID configurations it can recognize and correct bit rot errors.

                               Btrfs              GlusterFS            IPFS
Implementation Level           kernel             user space            P2P
Disk Layout              raw/gpt/dos/partition    partition             any
Additional FS required           -              ext4/xfs/btrfs          any
Disk Encryption¹                 -                    -                  -
Secure Network               not required            SSL³
Data Check Sum                   X                    X                  X
Local RAID                       X                    X                 any
Distributed RAID²                -                    X                 any
Heterogen. SSD/HDD aware RAID    -                    -                 any
RAID growth plus 1 disk          X           - (only by replica number) any
Bit Rot Correction               X                    X
De-duplication               service job              -                 auto
Master Server                not required         every host             No
Snapshots                        X                   100           no delete
Geo Replication                  -                    X                 auto
Performance                      ?                    ?                 P2P

¹ only possible via luks (cryptsetup), it is maybe added in the future to Btrfs

² By distributing you get higher availability

³ Encryption not enabled by default, requires either setup on all hosts or devices on the network must be trusted

GlusterFS issues

GlusterFS is designed to work on a server infrastructure with a heterogeneous disk storage setup. It doesn’t like different types of disks (SSD, HDD, NVME, USB, …) as the slowest disk is defining the write spreads. Then the disks sizes should be the same for paired disks if you use parity like in a RAID, combining a 1TB drive with a 4TB driver doesn’t work well automatically. If a node goes down, the data is not replicated automatically as well to other nodes to recover RAID level.

Tmpfs (usually not required)

If you want to compile a linux kernel or e.g. kodi and you run into temp disk space issues during compiling and you have plenty of RAM or if you want to reduce writes to a SSD disk, create a tmp file system in RAM of at least 6G on Arch Linux.

tmpfs /tmp tmpfs defaults,noatime,size=8G,mode=1777 0 0

Tools

HDD Alignment Check

$ blockdev –getalignoff /dev/sda1

Btrfs

Btrfs RAID Replace

If you replaced a failed device and the following command shows still errors, then you will have to manually reset the stats:

$ btrfs device stats /mountpoint
[/dev/mapper/mountpoint1].write_io_errs 0 
[/dev/mapper/mountpoint1].read_io_errs 0 
[/dev/mapper/mountpoint1].flush_io_errs 0 
[/dev/mapper/mountpoint1].corruption_errs 0 
[/dev/mapper/mountpoint1].generation_errs 0 
[/dev/mapper/mountpoint2].write_io_errs 75734255 
[/dev/mapper/mountpoint2].read_io_errs 396608381 
[/dev/mapper/mountpoint2].flush_io_errs 1827 
[/dev/mapper/mountpoint2].corruption_errs 0 
[/dev/mapper/mountpoint2].generation_errs 0

If replace worked without any issue and scrub doesn’t show errors, you can reset the stats with:

$ btrfs device stats -z /mountoint

Then run scrub for the mountoint and check the results of scrub and stats.

Btrfs repair

Repairing a btrfs unmounted drive should always be the last option if nothing else worked and a backup of your data is required as wrong data will be deleted. Do it only if you can allow to loose the data, something might be not recoverable especially if you didn’t use a RAID. Read https://btrfs.wiki.kernel.org/index.php/Btrfsck.

Btrfs RAID

Actually only RAID1 and RAID10 are stable. To calculate the available space in a RAID you can use a calculator http://carfax.org.uk/btrfs-usage/index.html.

Btrfs snapshots

If you want to run Btrfs snapshots before pacman updates, follow the guide https://wiki.archlinux.org/index.php/Snapper#Wrapping_pacman_transactions_in_snapshots

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.