General Question

grumpyfish's avatar

Raid 1 vs. Raid 5?

Asked by grumpyfish (6594 points ) February 2nd, 2010

I’m going to be building a little 750GB raid for photo archives in my existing linux server.

My two options seem to be (2) 750G drives mirrored, or (4) 250G drives in a striped configuration. The price difference is about $60 (more expensive to go with the (4) striped drives). Speed won’t really be an issue, as the network is a bottleneck more than the server read/write speed.

Suggestions on which to do? Other options?

Observing members: 0 Composing members: 0

17 Answers

njnyjobs's avatar

It depends on the I/O traffic between clients and server. If you have more than average traffic, it’s best to have a Raid 5, otherwise the extra disk capacity on a Raid 1 will help you accomodate more picture files.

grumpyfish's avatar

Thanks! There’s only one client, and it’s not on a fast network, so HDD speed won’t be an issue. Sounds like one vote for Raid 1 =)

markyy's avatar

A photo server for just one person sounds a little over the top. If it were me I would constantly find new purposes for the server other than just photo archiving. One of those might include something that will benefit from the increased speed in RAID 5, like downloading/storing some videofiles. I always like to prepare for the future and unfortunately that means I always feel the need to shell out that little bit of extra cash.

So Raid5 just in case, Raid1 if you have the restraint to stick with photo archiving :)

robmandu's avatar

May I ask your requirements for this little project?

RAID is intended to provide improved availability, redundancy, and performance… all factors which you imply are not needed here for your single server.

Are you attempting to create some kind of backup strategy here? If so, let’s talk about that because RAID is not the solution you seek then.

grumpyfish's avatar

@robmandu Excellent!

Here’s the requirements:
-> Store approximately 750GB of data locally, LAN availibility, internet connected
-> The data need to be protected against data loss

There are several different things being done with this data:
1. Long term storage of raw camera files for future reference
2. Access for processing into finished tiffs (which could be done from a local copy)
3. Long term storage & access of the finished images

Just to add—I can probably back up the finished images (and bulk processed raw files) onto Zenfolio.

grumpyfish's avatar

Added thought #2: Probably going to be a new server rather than the existing one.

tekn0lust's avatar

At this level of IO you will not see any appreciable difference in performance of the two different RAID arrays. You only really see a performance difference in high sustained local IO. You will see more difference in the amount of heat, noise generated and energy consumed by 4 disks over 2 than you will performance.

Either solution will protect you against one drive failure in the array. They will not protect you if the array is stolen, flooded, power surged or if data is deleted accidentally. For this you need an external backup of some kind.

Since you are adding this to an existing LINUX server are you adding the drives internal to the server or as an external enclosure? If internal do you have the wattage available from your existing power supply to power 4 more drives? If external what connectivity will it have back to the server? USB/Firewire?

robmandu's avatar

For reliability, availability, and decent performance, have you considered a drobo instead? You throw whatever disks you have handy into the thing and It Just Works™.

Jamie Zawinski has some excellent, no-nonsense instructions on a smart and safe backup strategy.

Re: jwz’s backup strategy, you don’t need multiple drobos. Just one for your main workhorse and then you can likely use normal USB/Firewire enclosure single disk drives for the backup copies (as long as they meet/exceed the total drobo storage).

grumpyfish's avatar

I did look at a drobo, and I’ve heard great things about it, but it’s out of my price range (my budget for this is no more than $200).

We have a similar scheme to jwz’s at work—we have a local server sitting next to the file server and then out to (2) geographically isolated locations, we use rsync to ship the data out. Usually shipping around 100–200MB on a 200G filesystem per day. Very very important since the main office is in downtown San Francisco. We use RAID5 for performance and reliability on the main server, and just straight discs for the rest.

Here’s the configuration I’m currently considering:
(2) 750G discs mirrored (internal drives in a new miniITX box)

Offsite backup will be handled in the cloud, so to speak.

jerv's avatar

0.75TB in the cloud…
Maybe it’s just that I am a little paranoid especially after the issues at T-mobile, but I’m not convinced that that is a really good option yet unless you have another backup elsewhere that you actually have control over.

While it’s true that with mirrored drives, it pretty much requires destroying the system (fire, flooding…) to lose data, I think it would be easier to just have an offline external drive stashed somewhere offsite than to trust the clouds at this point, especially if you are security/privacy conscious.

OTOH, maybe that is covered by “so to speak”.

robmandu's avatar

How’s your upload rate? Mine’s only a fraction of download speed and hence, it took me over a month to upload 70GB for backup purposes.

Think yours is fast enough for several hundred GB?

jerv's avatar

@robmandu I hadn’t even thought of that aspect of it since remote backup like that isn’t something I would do for other reasons anyways, but you raise a valid point. Even transfering 20GB over my home LAN took too damn long and that was over fifty times faster than my DSL connection. Maybe worthwhile if you have the highest-tier FIOS or similar speed (like many corporations that earn more in a second than most people earn in their lives) but not exactly feasible if you have more than a DVD worth of data unless the application is also “inn the cloud” and thus your ‘net connection only has to worry about displaying the output.
At that point, you are basically using a dumb terminal anyways and don’t even really need a full-on computer!

grumpyfish's avatar

@jerv still need to get the data into the cloud…

@robmandu Very good points =)

I actually had to run so didn’t finish the thought:
– The finished images and possibly rough (e.g., bulk autoprocessed) transfers of the raw files can be uploaded to zenfolio (I have unlimiting hosting there) as long as I can get the images under 12MB each.
– The other thought on backup up raw files will be to put a hard drive on some server somewhere and rsync the files into it. 8GB of raw files will take a long time on this link—but well under a week to transfer.

In this case I’m not worried about security as much as I would be otherwise—the finished images are going to be mostly released as creative commons, but it’s a very good thing to think about.

robmandu's avatar

Okay, let’s see. Where are we then?

You don’t really need any kind of RAID setup… but it sounds like you want one. Of the various configurations suggested, it sounds like RAID1 is your first choice for relative simplicity, fewer drives, and lowest cost. (If lowest cost is the primary factor above all others, then consider dropping RAID1 altogether. It’s just doubling your disk cost and not providing any worthwhile performance or reliability improvement).

Then I think we’ve crossed over into backup strategies a bit as well. A recommended best practice, as explained by jwz, is to employ both local and remote backups. For remote backups,we’re somewhat concerned about the throughput rate if using network bandwidth to move files. You estimated you could copy 8GB in “well under a week”. Let’s estimate you can schlep about 2GB of data over the wire per day. And let’s also figure that you might need to someday backup a sizable portion of one of those 750GB drives… guess oh, about 300GB worth. You’d be looking at 150 days (running 24/7 without interruption) to accomplish the task. That’s not a realistically supportable strategy.

If getting a better uplink isn’t possible (or financially responsible), then I suggest you clone the content of the 750GB drive on a regular basis… maybe once a month or every other week. Like for the local backup, you could use rsync for that task as well. And then keep that clone somewhere off site in case lightning strikes, or a flood hits, or some other unexpected event removes your server and primary disks from service.

robmandu's avatar

fyi:
———————

My backup scenario for my wife’s MacBook Pro with over 60,000 family photos:

LOCAL BACKUP
• Apple Time Capsule (802.11n wi-fi router + builtin 500GB hard drive)
• Mac OS X Time Machine keeps hourly, daily, weekly, monthly snapshots of all files
• Time Capsule hard drive space is practically 100% utilized with older items auto-deleting as newer ones get loaded.
• Instantly accessible, can be used to recover from accidental deletions, edits, whatever.

REMOTE BACKUP
• Cheap USB 2.0 external hard drive that resides in my desk drawer at the office (away from home) 99% of the time.
• Shirt Pocket software’s SuperDuper! backup utility (essentially, a beautiful GUI on top of rsync).
• Every couple of weeks connect the external drive to the MBP and run SuperDuper! to create a bootable backup copy of the main hard drive.
• Return the external drive to my office until needed next time.

grumpyfish's avatar

@robmandu ding I think you’ve hit the nail on the head, and I think I’ve been approaching this wrong. There’s just too much data to ship out on a weekly basis.

So:
(2) 750G external drives, one lives in a safe deposit box up the street (I work from home, so there’s no office to leave the drive in). Sync them weekly or so.

Safe deposit box is $35/yr and is right next to an excellent coffee joint. I solve the bandwidth problem with a sneakernet.

jerv's avatar

@grumpyfish Never underestimate the throughput of Sneakernet!
I can’t think of any other way to get more than ~10GB of data in/out in under an hour, at least not with any reasonably priced connection, and you can’t get much more reasonably priced than the connections already built into your system (USB, SATA, Firewire…) that are inherently faster than just about any residential ‘net connection.

Answer this question

Login

or

Join

to answer.

This question is in the General Section. Responses must be helpful and on-topic.

Your answer will be saved while you login or join.

Have a question? Ask Fluther!

What do you know more about?
or
Knowledge Networking @ Fluther