Sunday, 26 June 2016

create, help, list, status, activate, deactivate : GlusterFS Snapshots CLI Part 1

After discussing what GlusterFS snapshots are, what are their pre-requisites, and  what goes behind creation of a snapshot, it's time we actually created one, and familiarize ourselves with it.

To begin with let's create a volume called test_vol.
# gluster volume create test_vol replica 3 VM1:/brick/brick-dirs/brick VM2:/brick/brick-dirs/brick VM3:/brick/brick-dirs/brick
volume create: test_vol: success: please start the volume to access data
#
# gluster volume start test_vol
volume start: test_vol: success
#
# gluster volume info test_vol

Volume Name: test_vol
Type: Replicate
Volume ID: 09e773c9-e846-4568-a12d-6efb1cecf8cf
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: VM1:/brick/brick-dirs/brick
Brick2: VM2:/brick/brick-dirs/brick
Brick3: VM3:/brick/brick-dirs/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
#
As you can see, we created a 1x3 replica volume, and started it. We are now primed to take our snapshot of this volume. But before, we do so let's add some data to the volume.
# mount -t glusterfs VM1:/test_vol /mnt/test-vol-mnt/
#
# cd /mnt/test-vol-mnt
#
# ls -lrt
total 0
# touch file1
# ls -lrt
total 0
-rw-r--r-- 1 root root 0 Jun 24 13:39 file1
#
So we have successfully mounted our volume and created(touched) a file called file1. Now we will take a snapshot of 'test_vol' and we will call it 'snap1'.
# gluster snapshot create snap1 test_vol
snapshot create: success: Snap snap1_GMT-2016.06.24-08.12.42 created successfully
#
That's weird isn't it. I asked it to create a snapshot called snap1, and it created a snapshot called snap1_GMT-2016.06.24-08.12.42. What happened is it actually created a snapshot called snap1 and appended the snap's name with the timestamp of it's creation. This is the default naming convention of GlusterFS snapshots, and like everything else it it so for a couple of reasons.
  • This naming format is essential to support Volume Shadow Copy Service Support in GlusterFS volumes.
  • The reason for keeping it as default naming convention is that it is more informative than just a name. Scrolling through a list of snapshots not only gives you the thoughtful name you have chosen for it, but also the time the snapshot was created, which makes it so much more relatable to you, and gives you more clarity to decide what to do with the said snapshot.
But if it still look's icky to you, as it does to a lot of people, you can choose to  not have the timestamp appended by adding the no-timestamp option in the create command.
# gluster snapshot create snap1 test_vol no-timestamp
snapshot create: success: Snap snap1 created successfully
#
So there you go. Congratulation on creating your first GlusterFS snapshot. Now what do you do with it, or rather what all can you do with it. Let's ask for some help.
# gluster snapshot help
snapshot activate <snapname> [force] - Activate snapshot volume.
snapshot clone <clonename> <snapname> - Snapshot Clone.
snapshot config [volname] ([snap-max-hard-limit <count>] [snap-max-soft-limit <percent>]) | ([auto-delete <enable|disable>])| ([activate-on-create <enable|disable>]) - Snapshot Config.
snapshot create <snapname> <volname> [no-timestamp] [description <description>] [force] - Snapshot Create.
snapshot deactivate <snapname> - Deactivate snapshot volume.
snapshot delete (all | snapname | volume <volname>) - Snapshot Delete.
snapshot help - display help for snapshot commands
snapshot info [(snapname | volume <volname>)] - Snapshot Info.
snapshot list [volname] - Snapshot List.
snapshot restore <snapname> - Snapshot Restore.
snapshot status [(snapname | volume <volname>)] - Snapshot Status.
#
Quite the buffet isn't it. So let's first see what snapshots do we have here. gluster snapshot list will do the trick for us.
# gluster snapshot list
snap1_GMT-2016.06.24-08.12.42
snap1
#

# gluster snapshot list test_vol
snap1_GMT-2016.06.24-08.12.42
snap1
#
The list command will display all the snapshots in the trusted pool. Adding a volume's name along with the list command will list all snapshots of that particular volume only. As we have only one volume now, it shows the same result for both. It helps provide more clarity when you have a couple of volumes, and each volume has a number of snapshots.

We have previously discussed that a GlusterFS snapshot is like a GlusterFS volume. Just like a regular volume you can mount it, delete it, and even see it's status. So let's see the status of our snapshots.
# gluster snapshot status

Snap Name : snap1_GMT-2016.06.24-08.12.42
Snap UUID : 26d1455d-1d58-4c39-9efa-822d9397088a

    Brick Path        :   VM1:/var/run/gluster/snaps/f4b2ae1fbf414c8383c3b198dd42e7d7/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   95.81
    LV Size           :   616.00m


    Brick Path        :   VM2:/var/run/gluster/snaps/f4b2ae1fbf414c8383c3b198dd42e7d7/brick2/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.45
    LV Size           :   616.00m


    Brick Path        :   VM3:/var/run/gluster/snaps/f4b2ae1fbf414c8383c3b198dd42e7d7/brick3/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.43
    LV Size           :   616.00m


Snap Name : snap1
Snap UUID : 73489d9b-c370-4687-8be9-fc094ee78d0a

    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   95.81
    LV Size           :   616.00m


    Brick Path        :   VM2:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick2/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.45
    LV Size           :   616.00m


    Brick Path        :   VM3:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick3/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.43
    LV Size           :   616.00m
As with the volume status command, the snapshot status command also shows the status of all the snapshot bricks of all snapshots. Adding the snapname in the status command displays the status of only that particular snapshot.
# gluster snapshot status snap1

Snap Name : snap1
Snap UUID : 73489d9b-c370-4687-8be9-fc094ee78d0a

    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   95.81
    LV Size           :   616.00m


    Brick Path        :   VM2:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick2/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.45
    LV Size           :   616.00m


    Brick Path        :   VM2:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick3/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.43
    LV Size           :   616.00m
Similar to the snapshot list command adding the volname instead of the snapname in the status command displays the status of all snapshots of that particular volume.
The status itself gives us a wealth of information about each snapshot brick like the volume group, the data percentage, the LV Size. It also tells us if the brick is running or not, and if it is what is the PID of the brick. Interestingly we see that none of the bricks are running. This is the default behaviour of GlusterFS snapshots, where a newly created snapshot is in deactivated state(analogous to the Created/Stopped state of a GlusterFS volume), where none of it's bricks are running. In order to start the snap brick process we will have to activate the snapshot.
# gluster snapshot activate snap1
Snapshot activate: snap1: Snap activated successfully
#
# gluster snapshot status snap1

Snap Name : snap1
Snap UUID : 73489d9b-c370-4687-8be9-fc094ee78d0a

    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   Yes
    Brick PID         :   29250
    Data Percentage   :   95.81
    LV Size           :   616.00m


    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick2/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   Yes
    Brick PID         :   12616
    Data Percentage   :   3.45
    LV Size           :   616.00m


    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick3/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   Yes
    Brick PID         :   3058
    Data Percentage   :   3.43
    LV Size           :   616.00m
After the snapshot is activated, we can see the the bricks are running and their respective PIDs. The snapshot can also be deactivated again by using the deactivate command.
# gluster snapshot deactivate snap1
Deactivating snap will make its data inaccessible. Do you want to continue? (y/n) y
Snapshot deactivate: snap1: Snap deactivated successfully
#
# gluster snapshot status snap1

Snap Name : snap1
Snap UUID : 73489d9b-c370-4687-8be9-fc094ee78d0a

    Brick Path        :   VM1:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick1/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   95.81
    LV Size           :   616.00m


    Brick Path        :   VM2:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick2/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.45
    LV Size           :   616.00m


    Brick Path        :   VM3:/var/run/gluster/snaps/d5171e51e1ef407292ee4e24677385cb/brick3/brick
    Volume Group      :   snap_lvgrp
    Brick Running     :   No
    Brick PID         :   N/A
    Data Percentage   :   3.43
    LV Size           :   616.00m
Uptill now we have barely grazed the surface. There's delete, restore, config, and a whole lot more. We will be covering these in future posts.

Monday, 13 June 2016

GlusterFS Snapshots And Their Prerequisites

Long time, no see huh!!! This post has been pending on my part for a while now, partly because I was busy and partly because I am that lazy. But it's a fairly important post as it talks about snapshotting the GlusterFS volumes. So what are these snapshots and why are they so darn important. Let's find out...

Wikipedia says, 'a snapshot is the state of a system at a particular point in time'. In filesystems specifically, a snapshot is a 'backup' (a read only copy of the data set frozen at a point in time). Obviously, it's not a full backup of the entire dataset, but it's a backup nonetheless, which makes it pretty important. Now moving on to GlusterFS snapshots. GlusterFS snapshots, are point-in-time, read-only, crash consistent copies, of GlusterFS volumes. They are also online snapshots, and hence the volume and it's data continue to be available to the clients, while the snapshot is being taken.

GlusterFS snapshots are thinly provisioned LVM based snapshots, and hence they have certain pre-requisites. A quick look at the product documentation tells us what those pre-requisites. For a GlusterFS volume, to be able to support snapshots, it needs to meet the following pre-requisites:
  • Each brick of the GlusterFS volume, should be on an independent, thinly-provisioned LVM.
  • A brick's lvm should not contain any data other than the brick's.
  • None of the bricks should be on a thick LVM
  • gluster version should be 3.6 and above (duh!!)
  • The volume should be started.
  • All brick processes must be up and running.

Now that I have laid out the rules above, let me give you their origin story as well. As in, how do the GlusterFS snapshots internally enable you to take a crash-consistent backup using thinly-provisioned LVM in a space efficient manner. We start by having a look at a GlusterFS volume, whose bricks are on independent, thinly-provisioned LVMs.


In the above diagram, we can see that GlusterFS volume test_vol comprises of two bricks, Brick1 and Brick2. Both the bricks are mounted on independent, thinly-provisioned LVMs. When the volume is mounted, the client process maintains a connection to both the bricks. This is as much summary of GlusterFS volumes, as is needed for this post. A GlusterFS snapshot, is also internally a GlusterFS volume with the exception that, it is a read-only volume and it is treated differently than a regular volume in certain aspects.

When we take a snapshot (say snap1) of the GlusterFS volume test_vol, following set of things happen in the background:
  •  It is checked if the volume is in started state, and if so are all the brick processes up and running.
  • At this point in time, we barrier certain fops, in order to make the snapshots crash-consistent. What it means is even though it is an online snapshot, certain write fops will be barriered for the duration of the snapshot. The fops that are on the fly when the barrier is initiated will be allowed to complete, but the acknowledgement to the client will be pending till the snapshot creation is complete. The barriering has a default time-out window of 2 mins, within which if the snapshot is not complete, the fops are unbarriered, and we fail that particular snapshot.
  • After successfully barriering fops on all brick processes, we proceed to take individual copy-on-write LVM snapshots of each brick. A copy-on-write snapshot LVM snapshot ensures a fast, space-efficient backup of the data currently on the brick. These LVM snapshots reside in the same LVM thinpool as the GLusterFS brick LVMs.
  • Once this snapshot is taken, we carve bricks out of these LVM snapshots, and create a snapshot volume out of those bricks.
  • Once the snapshot creation is complete, we unbarrier the GlusterFS volume.

As can be seen in the above diagram, the snapshot creation process has created a LVM snapshot for each LVM, and these snapshots lie in the same thinpool as the LVM. Then we carve bricks (Brick1" and Brick2") out of these snapshots, and create a snapshot volume called snap1.

This snapshot, snap1 is a read-only snapshot volume which can be:
  • Restored to the original  volume test_vol.
  • Mounted as a read-only volume and accessed.
  • Cloned to create a writeable snapshot.
  • Can be accessed via User-Servicable-Snapshots.
All these functionalities will be discussed in future posts, starting with the command line tools to create, delete and restore GlusterFS snapshots.