Thursday, 13 December 2012

GlusterFS - The Product

I can only think of three reasons why you would be reading this :
  • You are a developer, interested in contributing to GlusterFS : If this is where you belong, you are in the right place. View this space, not as a guide, but as a peer's journal, and familiarize yourself with GlusterFS as I too continue to do the same.
  • You are an existing contributor to GlusterFS, and you are still reading this, wondering what am I rambling about : If you are one of the elites, please keep in mind that I too am in the process of understanding the product right now, and the purpose of this communication is only to pen down my understanding of the product as I continue to work on it. So while I am at it, I am sure there will be times when the information I put down here, might be inadequate. In such cases, I request you to kindly provide comments below, and help make this forum more valuable.
  • I have become so incredibly famous that you googled me, and landed up here : In case that this is true, cheers to me :p, If not, I guess it's pretty safe to assume, that you belong to either of the above two groups.
Before one starts to work on and contribute to a product, what's most essential is a fundamental overview of the product. So in this post, we will go through a very brief introduction of GlusterFS, followed by a hands-on of pulling in the source code, building and installing the product, and finally creating a volume.

So what is GlusterFS? Well, when we Google GlusterFS, this is what we find..
Wikipedia says "GlusterFS is a scale-out NAS file system.". The gluster-community says "GlusterFS is a powerful network/cluster file-system written in user space which uses FUSE to hook itself with VFS layer. GlusterFS takes a layered approach to the file system, where features are added/removed as per the requirement. Though GlusterFS is a File System, it uses already tried and tested disk file systems like ext3, ext4, xfs, etc. to store the data. It can easily scale up to petabytes of storage which is available to user under a single mount point."

Kudos to you if you understood all of it. If not let's start from the start. In the simplest terms we can say GlusterFS is a file system. So when you hear file system, you start thinking ext3, ext4, xfs right? Well they are disk file systems, which basically manage every file-system related operation(fops) in their own approach, and each of them have evolved over the years, to become quite stable and mature. But GlusterFS is not a disk file system. Instead of re-inventing the wheel, GlusterFS uses any of these disk files systems(xfs is recommended), as the back-end file system to perform all file related operations, while it itself sits on top, and that too in userspace. So the obvious question that's haunting you now, is what does it do at the top, that these fellows at the back-end are not doing. Well the answer lies in the one line that Wikipedia gave us : scale-out

What does scale-out really mean? Traditional storage systems, are single storage controller systems, which are connected to ideally a rack of disk drives. The processing power here will be fixed, and cannot be scaled as the storage capacity increases. A scale-out storage system, can consist of multiple modules. In this architecture, each module in the system, has processing as well as storage capacity, which also means, as the system scales, so does it's processing power. GlusterFS is scalable to petabytes and beyond, and provides a clustered scale-out solution.

Every module in GlusterFS is treated as a volume. A volume can consist of several bricks. A brick is a node or storage file-system, assigned to a volume. Now that you have a picture of bricks and volumes(doesn't matter even if it's a vague picture), try creating a volume of your own. Please go through this link to understand the developer's work-flow. If you are not familiar to git, you can have a look at this.

Now that you have gone through the developer's work-flow, registered yourself at Gerrit, and added your machine's public key to Gerrit, let's pull in the source code :
git clone ssh://[username@] ~/Source/
Now you have a directory called Source in your home directory, which contains the source code. Before we begin to build the source code, please ensure you have the following libraries installed :
yum install libtool autoconf automake flex bison openssl openssl-devel libibverbs-devel readline-devel libxml2-devel libacl-devel sqlite-devel python-devel userspace-rcu-devel dbench nfs-utils yajl attr psmisc
Once these libraries are installed, you can build GlusterFS.
# ./
# ./configure
# make
# make install
To verify which version of GlusterFS is installed
# gluster --version
Congratulations you have now successfully built GlusterFS from the source code and installed it. Let's create a volume. Please do make sure, you have a hostname assigned to your machine. This avoids any hassles from dynamic IPs. For the moment, you don't need actual servers to create a volume.
Create two directories node1 and node2 in your home directory to be used as bricks. The command to create a new volume is
gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport <tcp | rdma>] NEW-BRICK ...
For example :
# gluster volume create test-vol Hostname:/~/node1 Hostname:/~/node2
Start the volume:
# gluster volume start test-vol
To display the volume information :
# gluster volume info
In Association with


Unknown said...

Well, there is yet another reason why someone would end up here: trying to understand the how the filesystem works, along with the replication, duplication, implication behind each way of working with GlusterFS.

Gluster's documentation is not brilliant when it comes to discussing exactly how many bricks you need, the formula (if there is one) to calculate required volumes per setup, etc. Red Hat's docs are not much better (Red Hat Storage 2.0, or Red Hat Storage Server 2.0).

It happens to all new(ish) technology I guess.

But, hey, GlusterFS works like a charm !

Now THAT is brilliant !


Post a Comment