Terry : ZFS Administration

ZFS Administration

ZFS Overview

  • Pooled storage – no more volumes!
    Up to 248 datasets per pool – filesystems, iSCSI targets, swap, etc.
    Nothing to provision!
  • Filesystems become administrative control points
    Hierarchical, with inherited properties
    1. Per-dataset policy: snapshots, compression, backups, quotas, etc.
    2. Who's using all the space? du(1) takes forever, but df(1M) is instant
    3. Manage logically related filesystems as a group
    4. Inheritance makes large-scale administration a snap

Policy follows the data (mounts, shares, properties, etc.)
Delegated administration lets users manage their own data
ZFS filesystems are cheap – use a ton, it's OK, really!

  • Online everything

Creating Pools and Filesystems

Create a mirrored pool named “tank”

pool create tank mirror c2d0 c3d0

Create home directory filesystem, mounted at /export/home

zfs create tank/home
zfs set mountpoint=/export/home tank/home

Create home directories for several users
Note: automatically mounted at /export/home/{ahrens,bonwick,billm} thanks to inheritance

zfs create tank/home/ahrens
zfs create tank/home/bonwick
zfs create tank/home/billm

Add more space to the pool

zpool add tank mirror c4d0 c5d0

Setting Properties

Automatically NFS-export all home directories

zfs set sharenfs=rw tank/home

Turn on compression for everything in the pool

zfs set compression=on tank

Limit Eric to a quota of 10G

zfs set quota=10g tank/home/eschrock

Guarantee Tabriz a reservation of 20G

zfs set reservation=20g tank/home/tabriz

ZFS Snapshots

Read-only point-in-time copy of a filesystem

  • Instantaneous creation, unlimited number
  • No additional space used – blocks copied only when they change
  • Accessible through .zfs/snapshot in root of each filesystem
    Allows users to recover files without sysadmin intervention

Take a snapshot of Mark's home directory

zfs snapshot tank/home/marks@tuesday

Roll back to a previous snapshot

zfs rollback tank/home/perrin@monday

Take a look at Wednesday's version of foo.c

$ cat ~maybee/.zfs/snapshot/wednesday/foo.c

ZFS Clones

Writable copy of a snapshot

  • Instantaneous creation, unlimited number

Ideal for storing many private copies of mostly-shared data

  • Software installations
  • Source code repositories
  • Diskless clients
  • Zones
  • Virtual machines

Create a clone of your OpenSolaris source code

zfs clone tank/solaris@monday tank/ws/lori/fix

ZFS Send / Receive

Powered by snapshots

  • Full backup: any snapshot
  • Incremental backup: any snapshot delta
  • Very fast delta generation – cost proportional to data changed

So efficient it can drive remote replication

Generate a full backup

zfs send tank/fs@A >/backup/A

Generate an incremental backup

zfs send -i tank/fs@A tank/fs@B >/backup/B-A

Remote (over SSH) replication: send incremental once per minute

zfs send -i tank/fs@11:31 tank/fs@11:32 | ssh host zfs receive -d /tank/fs

ZFS Data Migration

Host-neutral on-disk format

  • Change server from x86 to SPARC, it just works
  • Adaptive endianness: neither platform pays a tax
    -- Writes always use native endianness, set bit in block pointer
    -- Reads byteswap only if host endianness != block endianness

ZFS takes care of everything

  • Forget about device paths, config files, /etc/vfstab, etc.
  • ZFS will share/unshare, mount/unmount, etc. as necessary

Export pool from the old server

old# zpool export tank

Physically move disks and import pool to the new server

new# zpool import tank

Native CIFS (SMB) Support

NT-style ACLs

  • Allow/deny with inheritance

True Windows SIDs – not just POSIX UID mapping

  • Essential for proper Windows interaction
  • Simplifies domain consolidation

Options to control:

  • Case-insensitivity
  • Non-blocking mandatory locks
  • Unicode normalization
  • Virus scanning

Simultaneous NFS and CIFS client access

ZFS and Zones (Virtualization)

Secure – Local zones cannot even see physical devices

Fast – snapshots and clones make zone creation instant

ZFS Root

Brings all the ZFS goodness to /

  • Checksums, compression, replication, snapshots, clones
  • Boot from any dataset

Patching becomes safe

  • Take snapshot, apply patch... rollback if there's a problem

Live upgrade becomes fast

  • Create clone (instant), upgrade, boot from clone
  • No “extra partition”

Based on new Solaris boot architecture

  • ZFS can easily create multiple boot environments
  • GRUB can easily manage them

ZFS Test Methodology

A product is only as good as its test suite

ZFS was designed to run in either user or kernel context

Nightly “ztest” program does all of the following in parallel:

  • Read, write, create, and delete files and directories
  • Create and destroy entire filesystems and storage pools
  • Turn compression on and off (while filesystem is active)
  • Change checksum algorithm (while filesystem is active)
  • Add and remove devices (while pool is active)
  • Change I/O caching and scheduling policies (while pool is active)
  • Scribble random garbage on one side of live mirror to test self-healing data
  • Force violent crashes to simulate power loss, then verify pool integrity

Probably more abuse in 20 seconds than you'd see in a lifetime

ZFS has been subjected to over a million forced, violent crashes without losing data integrity or leaking a single block

ZFS versions

Sample output from Solaris 11.1 x86_64

zfs

terry@solaris:/$ zfs get version 
NAME                            PROPERTY  VALUE  SOURCE
rpool                           version   6      -
rpool/ROOT                      version   6      -
rpool/ROOT/solaris              version   6      -
rpool/ROOT/solaris@install      version   6      -
rpool/ROOT/solaris/var          version   6      -
rpool/ROOT/solaris/var@install  version   6      -
rpool/VARSHARE                  version   6      -
rpool/dump                      version   -      -
rpool/export                    version   6      -
rpool/export/home               version   6      -
rpool/export/home/terry         version   6      -
rpool/swap                      version   -      -

zfs list

terry@solaris:~$ zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
rpool                    6.15G  13.2G  4.90M  /rpool
rpool/ROOT               4.04G  13.2G    31K  legacy
rpool/ROOT/solaris       4.04G  13.2G  3.72G  /
rpool/ROOT/solaris/var    225M  13.2G   200M  /var
rpool/VARSHARE             45K  13.2G    45K  /var/share
rpool/dump               1.03G  13.2G  1.00G  -
rpool/export             37.0M  13.2G    32K  /export
rpool/export/home        37.0M  13.2G    32K  /export/home
rpool/export/home/terry  36.9M  13.2G  36.9M  /export/home/terry
rpool/swap               1.03G  13.2G  1.00G  -

zpool

terry@solaris:/$ zpool get version rpool
NAME   PROPERTY  VALUE  SOURCE
rpool  version   34     default

Get all properties

terry@solaris:/$ zpool get all rpool
NAME   PROPERTY       VALUE               SOURCE
rpool  allocated      6.07G               -
rpool  altroot        -                   default
rpool  autoexpand     off                 default
rpool  autoreplace    off                 default
rpool  bootfs         rpool/ROOT/solaris  local
rpool  cachefile      -                   default
rpool  capacity       30%                 -
rpool  dedupditto     0                   default
rpool  dedupratio     1.00x               -
rpool  delegation     on                  default
rpool  failmode       wait                default
rpool  free           13.6G               -
rpool  guid           978222034842549205  -
rpool  health         ONLINE              -
rpool  listshares     off                 default
rpool  listsnapshots  off                 default
rpool  readonly       off                 -
rpool  size           19.6G               -
rpool  version        34                  default

zpool status

terry@solaris:/$ zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:
    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      c7t0d0s1  ONLINE       0     0     0
errors: No known data errors

zpool list

terry@solaris:~$ zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  19.6G  6.09G  13.5G  31%  1.00x  ONLINE  -

ZFS Summary

End the Suffering ? Free Your Mind

  • Simple Concisely expresses the user's intent
  • Powerful Pooled storage, snapshots, clones, compression, scrubbing, RAID-Z
  • Safe Detects and corrects silent data corruption
  • Fast Dynamic striping, intelligent prefetch, pipelined I/O
  • Open http://www.opensolaris.org/os/community/zfs
  • Free

Sample output of df -h

Solaris 11.1 x86_64

terry@solaris:/$ df -h
Filesystem             Size   Used  Available Capacity  Mounted on
rpool/ROOT/solaris      19G   3.7G        13G    23%    /
/devices                 0K     0K         0K     0%    /devices
/dev                     0K     0K         0K     0%    /dev
ctfs                     0K     0K         0K     0%    /system/contract
proc                     0K     0K         0K     0%    /proc
mnttab                   0K     0K         0K     0%    /etc/mnttab
swap                  1002M   1.4M      1001M     1%    /system/volatile
objfs                    0K     0K         0K     0%    /system/object
sharefs                  0K     0K         0K     0%    /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap1.so.1
                        17G   3.7G        13G    23%    /lib/libc.so.1
fd                       0K     0K         0K     0%    /dev/fd
rpool/ROOT/solaris/var
                        19G   200M        13G     2%    /var
swap                   1.1G   128M      1001M    12%    /tmp
rpool/VARSHARE          19G    44K        13G     1%    /var/share
rpool/export            19G    32K        13G     1%    /export
rpool/export/home       19G    32K        13G     1%    /export/home
rpool/export/home/terry
                        19G    18M        13G     1%    /export/home/terry
rpool                   19G   4.9M        13G     1%    /rpool
/export/home/terry      13G    18M        13G     1%    /home/terry

Solaris 10 update 8 x86_64

bash-3.00# df -h
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/s10x_u8wos_08a
                        20G   5.3G    13G    30%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   626M   380K   626M     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap1.so.1
                        18G   5.3G    13G    30%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                   629M   2.9M   626M     1%    /tmp
swap                   626M    36K   626M     1%    /var/run
rpool/export            20G    41K    13G     1%    /export
rpool/export/home       20G    22M    13G     1%    /export/home
rpool                   20G    34K    13G     1%    /rpool
/hgfs                   16G   4.0M    16G     1%    /hgfs
/tmp/VMwareDnD           0K     0K     0K     0%    /var/run/vmblock
bash-3.00#

Reference

Oracle Solaris 11.1 Administration: ZFS File Systems