oreilly.comSafari Books Online.Conferences.


Creating Filesystem Backups with 'rsync'

by Brian Wilson

Sometimes the simple, cheap solution wins out over cool technology.

At my ISP, we still use a tape backup system for long-term backups but we also have two identical disk drives in each server. A RAID-1 mirror would be the obvious way to get the data onto both drives to protect against failures. But what's more common in your experience -- a hard drive failure or accidentally deleting an important file?

Instead of using RAID-1, I use a Perl script called "synchro" to synchronize the drive pairs each night. In this article, I will present the reasons I decided do it this way, and share my script with you.

RAID technology

RAID can increase performance, but only under the right conditions. For best results, more than two drives and SCSI controllers are usually the way to go. In my case, we have EIDE controllers. EIDE requires that the CPU do a lot the work in data transfer so the CPU becomes a bottleneck. In my tests of Linux software RAID-1 with EIDE drives, the performance hit was more than we could live with. Therefore this is not really an option for us.

RAID-0 (striping) can increase available space, but does not provide increased reliability. With RAID-0 (and RAID-4 and -5 for that matter), data is striped across multiple drives to combine several physical partitions into one larger logical one. I use Linux software RAID-0 on two 40-GB drives to create a large filesystem to hold our NNTP news cache. In this application, reliability is not an issue because it's only a cache, so even losing the entire drive pair would only make reading news slower. Performance for the cache is not an issue because total number of people accessing the news server simultaneously is never high. RAID-4 and -5 offer redundancy but require even more CPU time to implement in software.

View/download the script.

View the script with reference line numbers.

RAID can increase reliability. With redundant configurations provided by RAID levels greater than 0, data is spread across multiple drives so that a single drive failure does not result in loss. I've used hardware RAID controllers in the past. I'd love to use something like a Vortex SCSI-RAID controller but our ISP operates on a small budget. I have found that being able to run down to a local discount store for replacement parts is far more practical than keeping emergency spares for exotic things like hardware RAID controllers on hand.

The complexity of a RAID setup (hardware or software) also makes more demands on the ISP staff; complicated systems can be very nerve-wracking when the phones are ringing because there is no server running to pick up the modem lines!

Project goals and requirements

My server has survived two drive failures. Both times, Linux started emitting warning messages days before the drives failed, so we were ready with tape backups and replacement parts. Drives can fail suddenly, but they often give you lots of warnings. This reduces our need to have a RAID-1 mirror. Just keep an eye on those log messages!

By far, our most common problem has not been hardware failures. It has been human error. Files are deleted or incorrectly modified both by our own staff and by clients, and need to be restored quickly. In this case, a RAID system will not help. A "delete" command will instantly and efficiently remove the file from both the drives in a mirror. You are still left with spinning the backup tapes, which can take hours.

I try to use revision control (RCS or CVS) for all system files. This allows backing out changes as long as everyone is consistent about checking in changes. Things still sometimes slip past us and this usually does not help with clients' files.

Related Reading

Running LinuxRunning Linux
By Matt Welsh, Matthias Kalle Dalheimer & Lar Kaufman
Table of Contents
Sample Chapters
Full Description
Read Online -- Safari

So my goals are to keep a backup filesystem online at all times to replace files that are accidentally modified or deleted, and to have a complete drive available to deal with less common hardware failures.

A slow mirror

The solution that I came up with for my server was to replicate data to the second drive once a day. It's like a RAID-1 mirror that takes a day to copy the files.

This approach is not perfect. With RAID-1, the files on the recovery drive would always be up to date, but this system is as good as a daily tape backup. It also does not help when deleted/changed files go unnoticed for more than a day -- they will end up disappearing from the secondary drive. Just be aware this little script is a supplement to a good tape backup scheme, not a replacement for it.

Pages: 1, 2

Next Pagearrow

Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!

Linux Resources
  • Linux Online
  • The Linux FAQ
  • Linux Kernel Archives
  • Kernel Traffic

  • Sponsored by: