Terry : Btrfs In Action on Ubuntu

Btrfs in Action on Ubuntu


Creating the initial volume (pool)

Btrfs uses different strategies to store data and for the filesystem metadata

By default the behavior is:

  • metadata is replicated on all of the devices. If a single device is used the metadata is duplicated inside this single device (useful in case of corruption or bad sector, there is a higher chance that one of the two copies is clean). To tell btrfs to maintain a single copy of the metadata, just use single. Remember: dead metadata = dead volume with no chance of recovery.
  • data is spread amongst all of the devices (this means no redundancy; any data block left on a defective device will be inaccessible)

To create a Btrfs volume made of multiple devices with default options, use:

mkfs.btrfs /dev/loop0 /dev/loop1 /dev/loop2

To create a Btrfs volume made of a single device with a single copy of the metadata (dangerous!), use:

mkfs.btrfs -m single /dev/loop0

To create a Btrfs volume made of multiple devices with metadata spread amongst all of the devices, use:

mkfs.btrfs -m raid0 /dev/loop0 /dev/loop1 /dev/loop2

To create a Btrfs volume made of multiple devices, with metadata spread amongst all of the devices and data mirrored on all of the devices, use:

mkfs.btrfs -m raid0 -d raid1 /dev/loop0 /dev/loop1 /dev/loop2 

To create a fully redundant Btrfs volume (data and metadata mirrored amongst all of the devices), use:

mkfs.btrfs -d raid1 /dev/loop0 /dev/loop1 /dev/loop2

Technically you can use anything as a physical volume: you can have a volume composed of 2 local hard drives, 3 USB keys, 1 loopback device pointing to a file on a NFS share and 3 logical devices accessed through your SAN (you would be an idiot, but you can, nevertheless). Having different physical volume sizes would lead to issues, but it works :-).

Example:

320GB HDD with 4 partitions (block devices) as /dev/sdb1 /dev/sdb2 /dev/sdb3 and /dev/sdb4

Create a RAID10 (requires at minimum 4 devices) with these 4 block devices

mkfs.btrfs -d raid10 -m raid10 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
adding device /dev/sdb2 id 2
adding device /dev/sdb3 id 3
adding device /dev/sdb4 id 4
fs created label (null) on /dev/sdb1
 nodesize 4096 leafsize 4096 sectorsize 4096 size 298.09GB
Btrfs Btrfs v0.19

Checking the initial volume

To verify the devices of which Btrfs volume is composed, just use btrfs-show device (old style) or btrfs filesystem show device (new style). You need to specify one of the devices (the metadata has been designed to keep a track of the what device is linked what other device).

btrfs filesystem show
Label: none uuid: 8f72dfe2-f8f8-48f1-9097-249a3b4910cd
 Total devices 4 FS bytes used 28.00KB
 devid 1 size 100.00GB used 2.03GB path /dev/sdb1
 devid 2 size 100.00GB used 2.01GB path /dev/sdb2
 devid 3 size 49.04GB used 2.01GB path /dev/sdb3
 devid 4 size 49.04GB used 2.01GB path /dev/sdb4
Btrfs Btrfs v0.19

BTRFS wiki mentions that btrfs device scan should be performed.

Mounting the initial volume

Btrfs volumes can be mounted like any other filesystem. The cool stuff at the top on the sundae is that the design of the BTRFS metadata makes it possible to use any of the volume devices. The following commands are equivalent:

mount -t btrfs /dev/sdb1 /btrfs
mount -t btrfs /dev/sdb2 /btrfs2
mount -t btrfs /dev/sdb3 /btrfs3
mount -t btrfs /dev/sdb4 /btrfs4  

For every physical device used for mounting the Btrfs volume df -hT reports the same (in all cases 3 GiB of "free" space is reported):

df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda6 ext4 15G 4.8G 9.3G 34% /
udev devtmpfs 7.8G 12K 7.8G 1% /dev
tmpfs tmpfs 3.1G 6.7M 3.1G 1% /run
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 7.8G 380K 7.8G 1% /run/shm
/dev/sda4 ext2 262M 49M 199M 20% /boot
/dev/sda8 ext4 7.8G 661M 6.8G 9% /home
/dev/mapper/linux-data ext4 317G 293G 25G 93% /data
/dev/sdb1 btrfs 299G 40K 193G 1% /btrfs
/dev/sdb2 btrfs 299G 40K 193G 1% /btrfs2
/dev/sdb3 btrfs 299G 40K 193G 1% /btrfs3
/dev/sdb4 btrfs 299G 40K 193G 1% /btrfs4

The following command prints very useful information (like how the BTRFS volume has been created):

btrfs filesystem df /btrfs
Data, RAID10: total=2.00GB, used=0.00
Data: total=8.00MB, used=0.00
System, RAID10: total=16.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID10: total=2.00GB, used=20.00KB
Metadata: total=8.00MB, used=4.00KB

By the way, as you can see, for the btrfs command the mount point should be specified, not one of the physical devices.

Shrinking the volume

A common practice in system administration is to leave some head space, instead of using the whole capacity of a storage pool (just in case). With btrfs one can easily shrink volumes. Let's shrink the volume a bit (about 25%):

btrfs filesystem resize -10G /btrfs
Resize '/btrfs' of '-10G'
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb1 btrfs 289G 52K 193G 1% /btrfs
/dev/sdb2 btrfs 289G 52K 193G 1% /btrfs2
/dev/sdb3 btrfs 289G 52K 193G 1% /btrfs3
/dev/sdb4 btrfs 289G 52K 193G 1% /btrfs4

And yes, it is an on-line resize, there is no need to umount/shrink/mount. So no downtimes! :-) However, a BTRFS volume requires a minimal size... if the shrink is too aggressive the volume won't be resized:

btrfs filesystem resize -500G /btrfs 
Resize '/btrfs' of '-500G'
ERROR: unable to resize '/btrfs'

Growing the volume

This is the opposite operation, you can make a Btrfs grow to reach a particular size (e.g. 150 more megabytes):

btrfs filesystem resize +10G /btrfs
Resize '/btrfs' of '+10G'
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb1 btrfs 299G 52K 193G 1% /btrfs
/dev/sdb2 btrfs 299G 52K 193G 1% /btrfs2
/dev/sdb3 btrfs 299G 52K 193G 1% /btrfs3
/dev/sdb4 btrfs 299G 52K 193G 1% /btrfs4 

You can also take an "all you can eat" approach via the max option, meaning all of the possible space will be used for the volume:

btrfs filesystem resize max /btrfs 

Adding a new device to Btrfs volume

Add a new device to the volume

btrfs device add /dev/sde /btrfs

Again, no need to umount the volume first as adding a device is an online operation. The operation is not finished as we MUST tell Btrfs to prepare the new device (i.e. rebalance/mirror the metadata and the data between all devices):

btrfs filesystem balance /btrfs

Check pool status using

btrfs filesystem show
btrfs filesystem df /btrfs

Removing a device from Btrfs volume

Removing a device from the BTRFS volume

btrfs device delete /dev/sde /btrfs

Check pool status

btrfs filesystem show
btrfs filesystem df /btrfs

NOTE: Here again removing a device is totally dynamic and can be done as online operation! Note that when a device is removed, its content is transparently redistributed among the other devices.

Obvious points

  • ** DO NOT UNPLUG THE DEVICE BEFORE THE END OF THE OPERATION, DATA LOSS WILL RESULT**
  • If you have used raid0 in either metadata or data at the BTRFS volume creation you will end in a unusable volume if one of the the devices fails before being properly removed from the volume as some stripes will be lost.

Once you add a new device to the Btrfs volume as a replacement for a removed one, you can clean up the references to the missing device:

btrfs device delete missing

Fast device replacement

New in Linux 3.8 Kernel

As a filesystem that expands to multiple devices, Btrfs can remove a disk easily, just in case you want to shrink your storage pool, or just because the device is failing and you want to replace it

btrfs device add new_disk
btrfs device delete old_disk 

But the process is not as fast as it could be. Btrfs has added a explicit device replacement operation which is much faster

btrfs replace mountpoint old_disk new_disk

The copy usually takes place at 90% of the available platter speed if no additional disk I/O is ongoing during the copy operation. The operation takes place at runtime on a live filesystem, it does not require to unmount it or stop active tasks, and it is safe to crash or lose power during the operation, the process will resume with the next mount. It's also possible to use the command "btrfs replace status" to check the status of the operation, or "btrfs replace cancel" to cancel it. The userspace patches for the btrfs program can be found [git://btrfs.giantdisaster.de/git/btrfs-progs here].

btrfsck

btrfsck - check a Btrfs filesystem

Build the latest btrfsck

git clone https://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
Cloning into 'btrfs-progs'...
remote: Counting objects: 2692, done.
remote: Compressing objects: 100% (894/894), done.
remote: Total 2692 (delta 2016), reused 2379 (delta 1794)
Receiving objects: 100% (2692/2692), 817.65 KiB | 192 KiB/s, done.
Resolving deltas: 100% (2016/2016), done.
cd btrfs-progs/
git checkout dangerdonteveruse
Switched to branch 'dangerdonteveruse'
make

Use btrfsck

btrfsck /dev/sdd1
found 733306880 bytes used err is 0
total csum bytes: 715052
total tree bytes: 1093632
total fs tree bytes: 65536
btree space waste bytes: 211112
file data blocks allocated: 732213248
 referenced 732213248
Btrfs Btrfs v0.19

Reference

http://www.funtoo.org/wiki/BTRFS_Fun

The Btrfs File System - Oracle Linux Administrator's Solutions Guide for Release 6

https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

http://masoncoding.com/presentation/

https://oss.oracle.com/~mason/presentation/