You are not logged in.

#1 2009-08-18 13:15:08

davvil
Member
Registered: 2008-05-06
Posts: 165

Parallel filesystem

In my department we are considering to install a parallel filesystem for our cluster of machines. Do you have any experience and/or recommendations about it?

What we have: raids for home directories and work data, a cluster with around 200 multi-processor machines booted via pxe, each with its own local disks. We do scientific computations (speech recognition, machine translation, image recognition) which can get quite disk IO intensive, with parallel jobs and several users accessing the same data concurrently. This puts too much load on the raid servers, so what we do until now is to copy the relevant data to local disks in a more or less manual way, and the jobs read the data from the local disk or from the local disk of another node in the cluster (in an attempt to distribute the load). The local disks of the nodes are mounted on the other nodes via an automounter. The whole process is error-prone (especially the automounter is far from reliable) and certainly not the best solution.

What we want: a parallel filesystem which simplifies the whole process. It should:
- Allow efficient concurrent access to the data by several jobs
- Be fault tolerant, if one node is down the data should have been replicated in another node (probably also a requirement for efficient parallel access)

GlusterFS or hdfs seem to go in the direction of what we want to accomplish. Did someone try them? Does someone have experience with this kind of systems? Easy administration is a plus point, but not necessarily a requirement.

PS: I am not a sysadmin of the department, therefore I do not know the exact architecture of the system. I'm asking mainly for pointers to pass them on to the sysadmin people. These forums are always a good location to find relevant information wink

Offline

#2 2009-08-18 22:45:36

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,222
Website

Re: Parallel filesystem

DRBD seems good, but I haven't actually used it yet so can't vouch for it:
http://www.drbd.org/

Correct me if I'm wrong, but wouldn't an enterprise grade SAN be appropriate here...?

Offline

#3 2009-08-20 09:43:48

davvil
Member
Registered: 2008-05-06
Posts: 165

Re: Parallel filesystem

Thanks for the suggestion! But from what I see from their webpage, drdb handles the replication of the data, but not necessarily the efficient parallel access. We will look a bit more into detail, but I am not sure if this would fit us.

Wrt. to the SAN, I did not come across this term before. It certainly looks like we could use it, but if I understood it correctly, we would need additional hardware and a bigger reorganization of our infrastructure. I do not think this would be realistic at the moment.

Offline

#4 2009-08-20 11:06:54

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,222
Website

Re: Parallel filesystem

davvil wrote:

Thanks for the suggestion! But from what I see from their webpage, drdb handles the replication of the data, but not necessarily the efficient parallel access. We will look a bit more into detail, but I am not sure if this would fit us.

I guess if it can replicate across your clients, then the clients can each read from their own local copy? I'm not sure if it supports that kind of thing though, I'm just throwing ideas out there.

davvil wrote:

Wrt. to the SAN, I did not come across this term before. It certainly looks like we could use it, but if I understood it correctly, we would need additional hardware and a bigger reorganization of our infrastructure. I do not think this would be realistic at the moment.

Yes -- there's only so far things can be pushed until you reach the limit and need to upgrade to get what you need. And there's never a way to explain this to non-IT people who refuse to let us upgrade wink

Offline

#5 2009-08-20 11:42:56

thisllub
Member
From: Northern NSW Australia
Registered: 2007-12-28
Posts: 231

Re: Parallel filesystem

DRBD / Heartbeat is easy and works pretty well.
Only downside is that  it won't build in Arch and I haven't tried to find out why.

I have NFS set up on it. I have two machines eg.  x.x.x.2 and x.x.x.3 connected via DRBD and the network presents itself as x.x.x.1 which is a duplicate of x.x.x.2.  Upon a problem occurring with x.x.x.2 x.x.x.3 presents itself as x.x.x.1.
The syncing is fast, especially if you use a dedicated pair of network cards between the two machines.
I would definitely look at it.

You can use pretty much any file system including LVM on top of it and under it.

Offline

Board footer

Powered by FluxBB