When I started having problems with my PC last November and I thought I had lost all of my data I knew I had to come up with a better backup plan. I really don't have that much data that's irreplacable: My photos and some documents or code that I've written. And I wouldn't want to try to rip all of my CDs again but I could certainly do it. I had been making backups of my photos on CD whenever a folder would get to about 700 megs. But when I got my LC1 and my gig SD card I started using a hell of a lot more space. After just one day of shooting I could have more than a CD's worth of pictures and I'd have to break it up into multiple folders of about 700 megs each. Needless to say it was a pain in the ass and I didn't do it very often. What I needed was more space on another computer. My home linux server didn't work at first because it had a 4 gig HD, 3 gigs of which was used by the OS. I soon upgraded the disk to 80 gigs, which was the biggest drive I could get for the same cost of a night out at a restaurant. That was my temporary solution.
The real solution, of course, was a RAID file server. My friend
Ryan built a 1 terabyte RAID file server a few years ago. He had eight drives and they kept on dying because his case couldn't disapate the heat that they were generating. He also said that it used a lot of electricity. That was way out of my price range a few years ago but when I bought an 80 gig drive recently I realized that storage was really cheap now and I could easily built a server myself. But what do I really need? Having about a hundred gigs of RAIDed storage would be cool, I could use it to back up the photos from my main computer. But, then you look at the prices of drives and you talk to your friends and they all have huge RAID fileservers and they rip their DVDs to it. And then one of your friends builds a two terabyte fileserver. And then you say to yourself: "Man, I'm such a pussy if I don't have that kind of capacity at my house". Ok, I'm sorta kidding, I don't really think I'm a pussy but it would be so much more fun to have a lot more space. Think about how cool it would be to setup MythTV with that kind of diskspace.
So, I've been doing a lot of research (research is half of the fun). Should I go with hardware or software RAID? Should I do RAID-5 or one of the other RAID levels? Should I run it on my regular linux server or should it run on a separate box? Should I use my old PC or buy a new rackmount case? Should I go with IDE or SATA? Should I buy a lot of drives or fewer? Should I try to get as much space as possible or should I just get whatever's cheapest and upgade in two years when disks four times larger cost half as much?
The first thing I did is find out
how RAID worked. In particular I wanted to know how RAID-5 worked because it seemed like magic that you could give up just one disk and gain redundancy. Magic Math? No, it's actually really simple. Let's say that you have 4 disks and you want to store the numbers 5, 12, and 8 (pretend that those numbers are part of your favorite digital photo from your last vacation). Then on the fourth disk you put the sum of the numbers on the previous disks (they actually use the XOR operator but it's easier to understand with addition):
| Disk 1 | Disk 2 | Disk 3 | Disk 4 (parity disk) |
| 5 | 12 | 8 | 25 |
Then, let's say that Disk 3 fails and you lost all of your data there:
| Disk 1 | Disk 2 | Disk 3 | Disk 4 (parity disk) |
| 5 | 12 |
| 25 |
You know that the only number that could possibly be on disk 3 could be 8 because those three numbers have to add up to 25. Pretty clever, no? When a disk fails you just pop in a new one and your RAID software reconstructs the lost data.
There are different reasons you would want to use RAID and they are discussed in the research paper that first described RAID. The reason I want RAID is for redundency, but I'm not totally paranoid about losing two disks so I'm pretty sure I'm gonna go with RAID-5. Well, I take that back, I'm extremely paranoid about losing my data but that leads me to want to have it on multiple computers too. I'd rather run RAID-5 on one computer and have a complely separate copy of the data on another than a completely redundent copy on one computer.
Now, I think the biggest decision to make is whether I want hardware RAID or software RAID. Hardware RAID would make me feel cool because that's what we use on our servers at work, and having a setup at home like a multi-billion dollar business is cool (I'm a geek, I know). Hardware RAID would also make it much easier to run with a lot of disks. They usually support 8 drives right out of the box. And if I wanted to have more than two drives I would need to buy at least one more adapter card anyways. And if I wanted to go with SATA I would probably need to buy at least two SATA cards anyways. Um, lemme just check that assumption out. Hey, I was wrong, my motherboard already supports two SATA drives. Hmmmm.... Well, let's see. I don't want to go with a cheap hardware RAID card. I've heard really bad things about them like you losing your data because they couldn't keep up. Most of discussions I've read debating the merits of hardware vs software RAID were written several years ago when people had slower processors and little RAM. I'm absolutely sure my P4 2.4 GHz with a gig of RAM is gonna out perform any cheap RAID controller doing all those XOR operations. And good hardware RAID cards cost like $400-$500. Hell no! I'm not going to spend that much on my first RAID server. Also, software RAID is cool because you will always be able to recover the data. If you choose hardware RAID you won't be sure your be able to get the same RAID card that you had. I'm pretty sure that each company writes stuff out to disk in their own format. With software RAID I'll always be able to get my hands on a linux distrabution.
I want to stick with as few drives as possible because I don't want 8 drives running in my closet. My guess is that they would be pretty loud but more importantly I would feel really really bad about wasting electricity on 8 drives that are spinning 24 hours a day. Three drives
on RAID-5 work but with that I'm wasting 50% of my capacity. With 4 drives I only waste 33% of my capcaity which is easier to live with. I would say that 4 or 5 drives is probably the sweet spot for me.
The second biggest decision is which drives am I gonna buy. I'm probably gonna buy them from NewEgg, cuz that company kicks ass. I've never had a problem with them and I don't have to pay sales tax. Also, I'm probably gonna stick with OEM drives since I can get them cheaper and I hate dealing with those stupid rebates that you need to almost always deal with when you buy a consumer drive at a resonable price. But, I've never been to Fry's so I'll at least make a trip there to scope out the situation. I keep on hearing that that's the place to pickup cheap drives. As of February 4, 2005 here are the prices (with shipping) for SATA drives on NewEgg:
| $344 | 400 gig | 86 cents/gig |
| $200 | 300 gig | 67 cents/gig |
| $140 | 250 gig | 56 cents/gig <-- sweet spot |
| $120 | 200 gig | 60 cents/gig |
| $98 | 160 gig | 62 cents/gig |
| $86 | 120 gig | 72 cents/gig |
And for $25 I can supposedly get another SATA card to support two more drives. So for my cheapest SATA setup I can get three 120 gig and with the new SATA card it comes out to $283 which would provide 240 gigs of effective space. The most cost effective drives are the 250 gig drives and with the SATA card and three drives it would be $445 which would provide 500 gigs (half a terabyte!) of effective space. For $585 I could get 750 gigs. It would be really nice to get a terrabyte of space since it's such a psycological barrier, but I don't know what I would do with that kind of space. [at future date insert comments making fun of how stupid this statement sounded, e.g. 640k is more RAM than anyone could possibly need]
Now, with IDE I can make it even cheaper. I already have a 120 gig IDE drive that's not being used. And for this discussion I'm assuming that I can't mix SATA and IDE in RAID, but I probably can with the software RAID.
| $147 | 250 gig | 59 cents/gig |
| $110 | 200 gig | 55 cents/gig |
| $84 | 160 gig | 53 cents/gig <-- sweet spot |
| $80 | 120 gig | 67 cents/gig |
Theoretically I could build a 240 gig server for only $180 (assuming 20 bucks for a second IDE controller). That would get me up and running pretty quickly. And if I can run both SATA and IDE on the RAID I could get the two 120 gig SATA drives for $172 and not even need to buy a new controller since my current IDE drive will be still running off of the motherboard's IDE controller. Then I wouldn't need to feel like I'm wasting money on drives that run on an interface that won't be used in a couple of years. The most cost effective IDE drives are the 160 gigs but I'd need at least three since I can't use my current 120 gig drive. I'm not even going to calculate the price because I can see that it's not going to be much different than the SATA drives.
So, what am I going to do? I don't know yet. I just recently got two new computers (a Mac Mini and an iMac) so I have stuff to play with. I think I'm gonna wait a little while until I need something new to play with.