Why your bioinformatics lab should be using ZFS

Jeff Wintersinger
by Jeff Wintersinger
July 18, 2014


ZFS is a wonderful filesystem released nine years ago by Sun Microsystems, which has since undergone a number of improvements and been ported to Linux. ZFS does a better job of ensuring your data remains intact than any other filesystem in common use, which is particularly important given that hard drives today suffer from an excess of both visible errors (“unrecoverable read errors”) and invisible ones (“bit rot”). ZFS also grants substantial space savings through its transparent application of LZ4 compression, which is unusually effective against the highly compressible data often seen in bioinformatics. By reading less physical data, concordant increases in read performance result.

Bioinformatics labs want to store great masses of data cheaply. ZFS lets you store these masses on consumer-class (read: cheap) hardware, without sacrificing performance or reliability. ZFS (and its still-in-development cousin, Btrfs) are not merely iterative improvements on traditional filesystems in wide use, such as NTFS, ext4, and HFS+–so significant are its improvements, I assert that ZFS represents a fundamental advance in filesystems, such that small and mid-sized bioinformatics labs can achieve significant benefits through its use.

Why hard drives are evil

Hard drives pain me–they are the most unreliable component in most computers, and yet also the one whose failure can most easily destroy our data. Traditionally, system administrators have overcome this problem by using various levels of RAID (such as RAID 1, RAID 5, RAID 6, or RAID 10), but this approach has become untenable. Robin Harris warned in 2007 that RAID 5 would become impossible to sustain by 2009, a topic he returned to in 2013. The problem is a consequence of ever-expanding drive capacities, combined with drive reliability levels that have remained constant. While drive capacities topped out at 1 TB in 2007, we now have 4 TB drives available for CDN $170, with 6 TB enterprise drives already on the market, and 10 TB drives to follow in the near future. We have not seen, however, a concurrent increase in drive reliability–4 TB consumer-class drives today still sport the same 1 error per 1014 read bits as drives one-quarter their size seven years ago.

The consequence of this constant error rate is that you’re much more likely to see your RAID array explode, with your data disappearing in a glorious fireball. Given a drive capacity of 4 TB and error rate of 1 error per 1014 read bits, you have a 32% chance of seeing an unrecoverable read error when reading the entirety of the drive. Now, suppose you have a five-disk RAID 5 array, with one disk used for parity. This means you must quadruple the 32% error rate for reading the entirety of the array, meaning that after replacing a single failed disk, your array rebuild will almost certainly encounter an unrecoverable read error, causing the entirety of the array to be lost. (The probability is even greater when you consider that, at least for multiple drives purchased from the same manufactured batch, drive failures are not independent events. One failed drive means the others are more likely to fail soon as well.) Clearly, in this scenario, the only viable solution is to dedicate two disks to parity by configuring a six-disk RAID 6 array instead, which will substantially decrease the probability of a catastrophic failure during rebuild.

However, RAID falters not only because of these errors. The 1 error per 1014 read bits refers only to unrecoverable read errors. Though the interface a hard drive presents to the operating system is fully in keeping with the logical and orderly existence of a digital computer, where bits march one after the other as computations complete, the reality is that our data is stored in the chaotic atomic world, which is ever so less pleasingly precise. Drives implement several corrective layers to overcome errors, but when these fail, bad data bubbles up from the drive. If the drive realizes this, it notifies the operating system of an unrecoverable read error; sometimes, however, the corrective layers fail and the drive does not notice, causing it to provide corrupt data to the operating system. Thus, even if your RAID array appears to be humming along happily, it may be silently reading and writing corrupt data. Traditional backup strategies, effective though they are against catastrophic array failure, are worthless in the face of silent corruption–the bad bits will be written into your backups, overwriting any previously good copies. You will not realize this until you attempt to use the data stored on your array, find it corrupt, and turn to your equally compromised backups. The notion of bit rot is hardly theoretical–one OS X user found that, of 15,264 photos consuming 105 GB that he had stored between 2006 and 2011, 28 had become silently corrupted by 2014.

All of the above should be of critical concern for any bioinformatics lab storing substantial volumes of data–which is to say, (almost) all bioinformatics labs. Bit rot is particularly nefarious. At the best of times, bioinformatics feels like a teetering tower on the verge of collapse, with a perilous array of hacked-together Perl and shell scripts spanning the gap from data to publication. If the integrity of data on which these analyses are built is called into question, the whole edifice becomes more tenuous yet. Labs are storing terabytes of data, often left to rot on its array for years at a time before it is again analyzed, by either the original lab or another. At best, such corrupt data will cause analyses to fail outright; at worst, the analyses will seemingly succeed, leading to invalid conclusions.

Why ZFS is awesome

Both unrecoverable read errors and silent bit rot can be overcome with modern filesystems like ZFS and Btrfs. Btrfs, alas, is not yet stable enough for production use. Once stable, however, it will offer several advantages over ZFS, the most prominent of which is a license permitting inclusion in the kernel source tree. ZFS, though more mature than Btrfs, is licensed such that it must be distributed as an out-of-tree module, making installation somewhat more arduous. Despite this difficulty, however, ZFS on Linux is amply mature for use in bioinformatics, given the wonderful efforts that the Lawrence Livermore National Laboratory have spent porting it to Linux over the last several years.

A multi-disk ZFS array provides the same reliability against complete drive failure and unrecoverable read errors as traditional RAID. ZFS mirrors are equivalent to RAID 1, in that one redundant drive accompanies each data drive, while ZFS raidz1, raidz2, and raidz3 provide one, two, and three redundant drives, respectively, for arrays composed of an arbitrary number of data disks, similar to RAID 5 and RAID 6.

ZFS’s greatest advantage, however, is that it also overcomes silent bit rot. By checksumming every file on the filesystem, ZFS knows when a file has been corrupted. So long as you’re using some form of redundancy, ZFS can then automatically recover the proper version of the file. This feature’s significance is difficult to overstate–it moves us from a world in which data gradually rots over time, both in backups and on live systems, to one in which we can be almost certain that our data’s integrity remains intact.

ZFS realizes two other substantial advances over traditional filesystems that are relevant to bioinformatics labs:

  • By merging the traditionally separate storage layers–redundancy (RAID via Linux’s md driver), volume manager (LVM), and filesystem (ext4)–into a single entity, ZFS is easier to configure and manage. This also leads to other improvements. For example, when rebuilding the array after a disk failure, ZFS need only read only data stored in the array, rather than every block as with traditional filesystem-agnostic RAID layers. This reduces rebuild time, decreasing the chance that a second crippling error will occur before the rebuild is complete.

  • ZFS supports transparent, automatic compression of stored data via LZ4. LZ4 is extremely CPU-cheap, meaning that you pay little performance price for having all stored data automatically compressed, particularly on the many-core systems common in bioinformatics. The (small) amount of time spent decompressing stored files is often outweighed by the increase in read speed that comes from having to read fewer physical bytes from the disk, meaning that compression provides an overall performance increase. Moreover, genomic data tends to be highly compressible, leading to significant space reductions and concordant performance gains when reading. Our home directories, where all our work is stored, achieve a compression ratio of 1.72x, meaning we’re storing 2.21 TB of data but consuming only 1.3 TB of space.

Of course, ZFS comes with some caveats.

  • Running your root filesystem on ZFS is possible, but both difficult and unsupported. When I looked into it, the chance of a borked update leading to an unbootable system seemed unacceptably high. This may remain true for some time, given the lack of integration ZFS suffers into base Linux installations because of its restrictive licensing. For now, running root on a traditional RAID array is best, with ZFS then handling your space-intensive duties (such as storing /home).

  • ZFS is extremely memory hungry. Moreover, running it on systems without ECC RAM is not recommended, as memory errors supposedly have a greater chance of causing ZFS to corrupt itself than traditional filesystems. Both points are likely not problematic for servers, however, which usually have great gobs of ECC RAM. Apple apparently abandoned their efforts to replace HFS+ with ZFS on OS X because of these issues.

With that said, for bioinformatics workloads, ZFS’s benefits massively outweigh its drawbacks.

Installing ZFS

Notes on both the method I initially used for installing ZFS on Ubuntu 14.04 Trusty Tahr and the manner in which I recovered from a failing drive are available. These last notes proved regrettably necessary–despite subjecting both of our newly purchased Seagate 4 TB drives to multi-day stress tests via badblocks before putting them into production, one drive exhibited failure signs only two weeks into its tenure. Though it didn’t fail outright, it developed eight bad sectors, and threw an unrecoverable read error (which, of course, ZFS handled with aplomb). As such, we replaced the drive and rebuilt the array, and everything has been running smoothly since.

Altogether, I heartily endorse ZFS. It’s fast, reliable, and space efficient. In concert, these qualities make it downright sexy. Most small to mid-sized bioinformatics labs will benefit from its use.

Post navigation