ZFS Administration
ZFS Overview
- Pooled storage – no more volumes!
Up to 248 datasets per pool – filesystems, iSCSI targets, swap, etc.
Nothing to provision! - Filesystems become administrative control points
Hierarchical, with inherited properties- Per-dataset policy: snapshots, compression, backups, quotas, etc.
- Who's using all the space? du(1) takes forever, but df(1M) is instant
- Manage logically related filesystems as a group
- Inheritance makes large-scale administration a snap
Policy follows the data (mounts, shares, properties, etc.)
Delegated administration lets users manage their own data
ZFS filesystems are cheap – use a ton, it's OK, really!
- Online everything
Creating Pools and Filesystems
Create a mirrored pool named “tank”
pool create tank mirror c2d0 c3d0
Create home directory filesystem, mounted at /export/home
zfs create tank/home zfs set mountpoint=/export/home tank/home
Create home directories for several users
Note: automatically mounted at /export/home/{ahrens,bonwick,billm} thanks to inheritance
zfs create tank/home/ahrens zfs create tank/home/bonwick zfs create tank/home/billm
Add more space to the pool
zpool add tank mirror c4d0 c5d0
Setting Properties
Automatically NFS-export all home directories
zfs set sharenfs=rw tank/home
Turn on compression for everything in the pool
zfs set compression=on tank
Limit Eric to a quota of 10G
zfs set quota=10g tank/home/eschrock
Guarantee Tabriz a reservation of 20G
zfs set reservation=20g tank/home/tabriz
ZFS Snapshots
Read-only point-in-time copy of a filesystem
- Instantaneous creation, unlimited number
- No additional space used – blocks copied only when they change
- Accessible through .zfs/snapshot in root of each filesystem
Allows users to recover files without sysadmin intervention
Take a snapshot of Mark's home directory
zfs snapshot tank/home/marks@tuesday
Roll back to a previous snapshot
zfs rollback tank/home/perrin@monday
Take a look at Wednesday's version of foo.c
$ cat ~maybee/.zfs/snapshot/wednesday/foo.c
ZFS Clones
Writable copy of a snapshot
- Instantaneous creation, unlimited number
Ideal for storing many private copies of mostly-shared data
- Software installations
- Source code repositories
- Diskless clients
- Zones
- Virtual machines
Create a clone of your OpenSolaris source code
zfs clone tank/solaris@monday tank/ws/lori/fix
ZFS Send / Receive
Powered by snapshots
- Full backup: any snapshot
- Incremental backup: any snapshot delta
- Very fast delta generation – cost proportional to data changed
So efficient it can drive remote replication
Generate a full backup
zfs send tank/fs@A >/backup/A
Generate an incremental backup
zfs send -i tank/fs@A tank/fs@B >/backup/B-A
Remote (over SSH) replication: send incremental once per minute
zfs send -i tank/fs@11:31 tank/fs@11:32 | ssh host zfs receive -d /tank/fs
ZFS Data Migration
Host-neutral on-disk format
- Change server from x86 to SPARC, it just works
- Adaptive endianness: neither platform pays a tax
-- Writes always use native endianness, set bit in block pointer
-- Reads byteswap only if host endianness != block endianness
ZFS takes care of everything
- Forget about device paths, config files, /etc/vfstab, etc.
- ZFS will share/unshare, mount/unmount, etc. as necessary
Export pool from the old server
old# zpool export tank
Physically move disks and import pool to the new server
new# zpool import tank
Native CIFS (SMB) Support
NT-style ACLs
- Allow/deny with inheritance
True Windows SIDs – not just POSIX UID mapping
- Essential for proper Windows interaction
- Simplifies domain consolidation
Options to control:
- Case-insensitivity
- Non-blocking mandatory locks
- Unicode normalization
- Virus scanning
Simultaneous NFS and CIFS client access
ZFS and Zones (Virtualization)
Secure – Local zones cannot even see physical devices
Fast – snapshots and clones make zone creation instant
ZFS Root
Brings all the ZFS goodness to /
- Checksums, compression, replication, snapshots, clones
- Boot from any dataset
Patching becomes safe
- Take snapshot, apply patch... rollback if there's a problem
Live upgrade becomes fast
- Create clone (instant), upgrade, boot from clone
- No “extra partition”
Based on new Solaris boot architecture
- ZFS can easily create multiple boot environments
- GRUB can easily manage them
ZFS Test Methodology
A product is only as good as its test suite
ZFS was designed to run in either user or kernel context
Nightly “ztest” program does all of the following in parallel:
- Read, write, create, and delete files and directories
- Create and destroy entire filesystems and storage pools
- Turn compression on and off (while filesystem is active)
- Change checksum algorithm (while filesystem is active)
- Add and remove devices (while pool is active)
- Change I/O caching and scheduling policies (while pool is active)
- Scribble random garbage on one side of live mirror to test self-healing data
- Force violent crashes to simulate power loss, then verify pool integrity
Probably more abuse in 20 seconds than you'd see in a lifetime
ZFS has been subjected to over a million forced, violent crashes without losing data integrity or leaking a single block
ZFS versions
Sample output from Solaris 11.1 x86_64
zfs
terry@solaris:/$ zfs get version NAME PROPERTY VALUE SOURCE rpool version 6 - rpool/ROOT version 6 - rpool/ROOT/solaris version 6 - rpool/ROOT/solaris@install version 6 - rpool/ROOT/solaris/var version 6 - rpool/ROOT/solaris/var@install version 6 - rpool/VARSHARE version 6 - rpool/dump version - - rpool/export version 6 - rpool/export/home version 6 - rpool/export/home/terry version 6 - rpool/swap version - -
zfs list
terry@solaris:~$ zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 6.15G 13.2G 4.90M /rpool rpool/ROOT 4.04G 13.2G 31K legacy rpool/ROOT/solaris 4.04G 13.2G 3.72G / rpool/ROOT/solaris/var 225M 13.2G 200M /var rpool/VARSHARE 45K 13.2G 45K /var/share rpool/dump 1.03G 13.2G 1.00G - rpool/export 37.0M 13.2G 32K /export rpool/export/home 37.0M 13.2G 32K /export/home rpool/export/home/terry 36.9M 13.2G 36.9M /export/home/terry rpool/swap 1.03G 13.2G 1.00G -
zpool
terry@solaris:/$ zpool get version rpool NAME PROPERTY VALUE SOURCE rpool version 34 default
Get all properties
terry@solaris:/$ zpool get all rpool NAME PROPERTY VALUE SOURCE rpool allocated 6.07G - rpool altroot - default rpool autoexpand off default rpool autoreplace off default rpool bootfs rpool/ROOT/solaris local rpool cachefile - default rpool capacity 30% - rpool dedupditto 0 default rpool dedupratio 1.00x - rpool delegation on default rpool failmode wait default rpool free 13.6G - rpool guid 978222034842549205 - rpool health ONLINE - rpool listshares off default rpool listsnapshots off default rpool readonly off - rpool size 19.6G - rpool version 34 default
zpool status
terry@solaris:/$ zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s1 ONLINE 0 0 0 errors: No known data errors
zpool list
terry@solaris:~$ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT rpool 19.6G 6.09G 13.5G 31% 1.00x ONLINE -
ZFS Summary
End the Suffering ? Free Your Mind
- Simple Concisely expresses the user's intent
- Powerful Pooled storage, snapshots, clones, compression, scrubbing, RAID-Z
- Safe Detects and corrects silent data corruption
- Fast Dynamic striping, intelligent prefetch, pipelined I/O
- Open http://www.opensolaris.org/os/community/zfs
- Free
Sample output of df -h
Solaris 11.1 x86_64
terry@solaris:/$ df -h Filesystem Size Used Available Capacity Mounted on rpool/ROOT/solaris 19G 3.7G 13G 23% / /devices 0K 0K 0K 0% /devices /dev 0K 0K 0K 0% /dev ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 1002M 1.4M 1001M 1% /system/volatile objfs 0K 0K 0K 0% /system/object sharefs 0K 0K 0K 0% /etc/dfs/sharetab /usr/lib/libc/libc_hwcap1.so.1 17G 3.7G 13G 23% /lib/libc.so.1 fd 0K 0K 0K 0% /dev/fd rpool/ROOT/solaris/var 19G 200M 13G 2% /var swap 1.1G 128M 1001M 12% /tmp rpool/VARSHARE 19G 44K 13G 1% /var/share rpool/export 19G 32K 13G 1% /export rpool/export/home 19G 32K 13G 1% /export/home rpool/export/home/terry 19G 18M 13G 1% /export/home/terry rpool 19G 4.9M 13G 1% /rpool /export/home/terry 13G 18M 13G 1% /home/terry
Solaris 10 update 8 x86_64
bash-3.00# df -h Filesystem size used avail capacity Mounted on rpool/ROOT/s10x_u8wos_08a 20G 5.3G 13G 30% / /devices 0K 0K 0K 0% /devices ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 626M 380K 626M 1% /etc/svc/volatile objfs 0K 0K 0K 0% /system/object sharefs 0K 0K 0K 0% /etc/dfs/sharetab /usr/lib/libc/libc_hwcap1.so.1 18G 5.3G 13G 30% /lib/libc.so.1 fd 0K 0K 0K 0% /dev/fd swap 629M 2.9M 626M 1% /tmp swap 626M 36K 626M 1% /var/run rpool/export 20G 41K 13G 1% /export rpool/export/home 20G 22M 13G 1% /export/home rpool 20G 34K 13G 1% /rpool /hgfs 16G 4.0M 16G 1% /hgfs /tmp/VMwareDnD 0K 0K 0K 0% /var/run/vmblock bash-3.00#