Parallel Filesystem: another review

Once in a life time, you might want to have big, big, big storage such that no more remove operations are required. I don't know other but I really want to have one. Cluster is a good approach to make my dream come true. There are so many solution out there. I will not talk about commercial product since I don't have enough money support to buy one so this post only focuses on open source solutions.

One of them you should have heard is pvfs and pvfs2. pvfs2 is very powerful and stable as long as all storages are functional. In other words, you may loss all data if one of them crashes especially metadata node. I used to loss my all data just in a second. Sadly. That's why I have to find another solution.

The next one is Gfarm. Gfarm is a very promising parallel filesystem for large-scale storage over Internet. In particular, Gfarm allows me to do lots of mid-level and low-level operation to control performance myself. As a result, I didn't get all only RAID0 as pvfs offered but also any RAID up to my tuning and configuration. There is only one single point of failure left here at metadata node. Fortunately, Gfarm may run on top of PostgreSQL so I can back it up easily and actively.

Only problem of Gfarm is that it is not designed primitively to be a full-function filesystem like pvfs. I have to access files in Gfarm through its APIs. To solve this problem, Gfarm provides syscall hook library and GfarmFS via FUSE. syscall hook is a good idea to use LD_PRELOAD technique to hook file operation under /gfarm. However, I found some problems to use this technique with i386 executable on x86_64 kernel. I tried both i386 preload and x86_64 preload with no luck at all. That's why GfarmFS must be available. It provides a wrapper to mount Gfarm via userspace handler. Unfortunately, I got problems regarding permission. Maybe I don't understand FUSE enough to use it properly. By the way, it looks promising.

Technorati Tags: , , , , , ,

Post new comment