Recommended disk configuration?

Added by Xypriz . about 1 year ago

Hello,

I've got a Supermicro system with the following hardware configuration:

8x 2TB 7200rpm standard HDD on a 3Ware 9650SE-8LPML (set up in single disk mode).

2x 512 GB SSD connected to motherboard

1x 1 TB 7200rpm enterprise HDD connected to motherboard

NexentaStor Community is installed.

I've setup a single raidz2 volume with 7x 2TB, 1x 2TB hot-spare and 2x 512 GB SSD mirrored cache. There is no log drive configured atm. As I'm in the process of building/ testing the system I can alter the system configuration without having to think about data on it.

I would like to know if this is a good setup in your opinion. Is a log drive really required? Will it give a performance boost while I'm already having 2 SSD in mirror as cache?

The system will be used with VMware ESXi 5. I don't know if I'm going to use iSCSI or NFS yet. Maybe this depends on the set config too?

Can someone give me some recommendations on this and explain why?

Thanks for your help!


Replies

RE: Recommended disk configuration? - Added by Jason Litka about 1 year ago

There is no benefit to mirrored cache drives.

How much memory is in this system? What kind of read/write ratio do you expect? What size writes?

RE: Recommended disk configuration? - Added by Xypriz . about 1 year ago

OK, so what you're basically saying is that I can better assign one SSD for cache and one for log??

There is 16 GB of memory in the system. Read/ write ratio is 80/ 20 I guess (VMware, about 25 VM's will be on it). What do you exactly mean with size writes?

RE: Recommended disk configuration? - Added by Linda Kateley about 1 year ago

log is write cache, which is good to have mirrored, cache is read cache which if it breaks, then you can go to the hdd's for the data, so it doesn't need to be mirrored.

log will improve perf for cifs or nfs, not much for iscsi.

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

log should improve perf on vmware iscsi. How many IOPs write are you looking to achieve?

If you are going to add a log device I would keep your 7x1+HS raidz2 setup and then put your 2x SSDs as the cache devices. If you are not going to use a log device I would setup your system with 4x2 (and add another HS) or 3x2 (w/ 2 HS) and then again have your 2 SSDs as the read cache.

RE: Recommended disk configuration? - Added by Xypriz . about 1 year ago

Performance and data redundancy are key in this setup.

About the IOPS; I don't exactly know yet. As much as possible ;-). Don't have a baseline for that yet. What do you think what should be possible in this setup?

Tonight I'll connect the Nexenta machine up to my VMware hosts so I can do some testing. ATM, I have 8x 2TB in raidz2 (without hot-spare), 1 SSD as cache and 1 SSD as log. Still thinking about the best setup with this hardware. I'm exactly on the 18 TB limit of the Community edition so no room for another disk. If you have any suggestions, please let me know.

Jeff, is it a good idea to split it up in two raidz volumes? I'm coming from the RAID-world and there I learned one basic rule; the more disks in one RAID-set, the better it's performance/ IOPS (generally speaking). Does this rule also count in a ZFS-based environment? Or is it there all about caching or maybe even both?

Thanks!

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Performance (more specifically IOPS) is driven by the number of vdevs, not disks, per-se (one notable exception is that you can get much better random read perf with n-way mirrors...)

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

For the ZFS world the following are the data points for disks as I've learned them

  • Each vdev (group of disks) will only have the write performance of a single disk (this can be skewed by caching)
  • Inside each vdev reads will be distributed across all disks (here more disks is always faster)
  • As disks grow larger than 500-750GB consider adding addition mirror/parity disks to prevent data loss while a potential rebuild is going on

If I recall correctly (linda or someone else may need to correct me) but the community edition counts by data that is addressed vs the raw size of the pool (size of the pool is how enterprise calculates it) so i think you'll be under the limit no matter how you lay out your disks. You can compare the two values by using zpool list (size) vs zfs list (referenced). I'm hoping v4 goes to this model (or even something better) for the enterprise as it's quite the penalty when you have to pay to get better redundancy on your disks...

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Jeff, I assume you meant for raidz*?

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

Not really.

Think about it this way. In raid1 ZFS requires data to be written successfully to both (or however many mirrors it has) disks before it can move on to the next block of data; this effectively limits write performance to that of a single disk. Similarly for raidz* the parity information must be written to all disks successfully before it's allowed to unblock and write the next block. In both of these cases you have the write performance of a single disk in a single vdev, therefore to scale write performance you have to increase the number of vdevs.

For reads you have the following cases. In raid1 zfs (to my knowledge) will send the request to any drive that is not used. At a QD of 1 this means that you would only get the read performance of a single disk, but since most of these are going to see higher queue depths, then as long as QD>=number of mirrors the read effectively will be round robbin'd to all disks. RaidZ* has to distribute parity across all drives evenly so there's an inherent load sharing built into it and wont need a larger QD to scale up reads. This leads to (assuming i've not botched some info somewhere) that a 4way mirror at a qd>4 should have the same read throughput of a 4disk raidz1 pool. The caveat to the 4way mirror is that it could theoretically have better random performance since each disk head could be positioned differently whereas the raidz pool all 4 disks just got done looking for a single piece of data and now all 4 have to go hunt for the next piece. This is where you get your exception you stated above.

So following the above all the way through to increase write you always need to increase the number of vdevs. To increase reads increase either the number of vdevs or the depth(number of disks) inside a vdev.

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Sorry, I misread. For some reason, I thought you said 'writes', not 'reads' :)

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

No worries Dan, thankfully it seems most of us check our egos at the door on this forum to help each other.

I also thought I might want to share the math I've gleaned from various places about why you need more parity/mirror devices since I've not seen it around here.

For SATA disks the mtbf is usually between 300k-1M hours. An example is the cheapest 7.2k 2tb drive on newegg right now has an AFR of <1%. This would equate to roughly an MTBF of 870k hours (only valid for the first couple of years).

So using the MTTDL formulas against a 2 disk RAID1 w/ a 72hr rebuild time (2tb takes a long time to read and write so this rebuild time may not be accurate) leads to a MTTDL of ~5.2B hrs or an AFR of .000167% for that vdev or 1 in ~600k. Now lets use some more realistic numbers of 150k MTBF and 168hrs for a rebuild (a week) gives an AFR of .013% so you only have to be 1 in 7600 to have a drive failure... Using the same values with a 3disk raid1 you get an AFR of .000044% or 1 in 2.2Million, I' call that a relatively safe bet for my data.

Lets do the same exercise for a 5disk raidz (There are other posts for why you should want to use 2^n data disks). Using the ideal numbers you have an AFR of .00389% or 1 in ~26k, even that number is a little shaky for trusting my data, especially if you use more realistic numbers. You get .13% AFR or a 1 in 765 chance of loosing data in your first year of using that array. So lets move to RaidZ2 with those numbers to get an AFR of .000879% or 1 in ~114k.

If you want full math I can put that in another post with steps shown.

Sources:

1 R. Elling

2 ZDNet

RE: Recommended disk configuration? - Added by Xypriz . about 1 year ago

Thanks for the info Jeff! Very nice to read this background info regarding ZFS!

Back to Nexenta and my config; how would you configure it while keeping this hardware (without adding/ removing HDDs/ SDDs)? I do unstand that the more vdevs the better the write performance will be. However, I've got two SSD's in this case which can speed up both read and write (please correct me if I'm wrong); is it safe to put 1 SSD as cache and 1 SSD as logging device for a raidz2 pool of 7 or 8 disks (with or without hotspare)? The machine itself is redundant on power and is connected to an UPS.

RE: Recommended disk configuration? - Added by Xypriz . about 1 year ago

BTW: Sorry guys if you've got spammed by mail from the Nexenta forum, I've edited my previous post a few times ;-)

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

With that number of drives you're in a weird position for drive layouts. You normally want to have a power of 2 number of data drives 2, 4, 8, etc... in a RaidZ* layout. With long rebuild times I'd either go with a 2x3 RaidZ + 2HS, 1x6 RaidZ2 + 2HS or 3x2 Raid1 + 2HS. In all cases yes I'd have a split of read and write cache devices.

If you're willing to defy all the previous zfs "truths" you could setup a single 1x7 RaidZ2+HS and test to see if it performs anywhere near what you want. If it gives you the performance you want then you're good to go otherwise try one of the configs above.

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Good points. In his setup, I'm wondering if he's not better off just going with an 8 disk raidz2. The performance impact of not having the power of 2 for data drives is supposed to be fairly minimal, and an 8-disk raidz2 gives him 6 usable disks of storage with 2-disk failure protection. Heck, if he's paranoid, go for an 8-disk raidz3 :)

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

Yeah if you completely trust to be alerted to a failed drive and are able to get a replacement in short order not running with a hot spare is probably acceptable, but I've just gotten too paranoid about not having a drive in the system ready to rebuild immediately.

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Good point. OTOH, I would argue you are still better off putting as many drives in the raidz* pool as possible, and any leftovers as HS. e.g. better off having a 7-drive raidz3 and 1 HS, since the extra drive is actually participating in the pool, so you eliminate the window where you have one less drive in the pool during resilvering...

RE: Recommended disk configuration? - Added by Jeff Gibson about 1 year ago

Yeah a 7drive raidz3 would fit all criteria of 4 data disks (very good suggestion), but you've effectively got the same number of data disks as a 2x3 raidz1+2HS. I'll concede you'll have probably higher sequential reads, but I think the 2vdev pool will perform more consistently for you. Although you're almost at the same point as just using all your disks in Raid1 at this point since that's the same number of data disks if you don't mind not having a hot spare (wait, wasn't i just preaching about the need for hot spares...)

RE: Recommended disk configuration? - Added by Dan Swartzendruber about 1 year ago

Yeah, to be clearer, I was trying to do apples/apples (e.g. one vdev only), so comparing 7-raidz3+1HS with 6-raidz2+2HS.