Forums » Performance discussions »
Poor write performance
Added by Knut Paulsen about 1 year ago
I have been running a 3.0.4 version for a bit over a year with not much load. Lately I have transferered a few more vms and therfor the load have increased. The read performance have been good, and is still good. But the write performance is terrible!
The setup is a Dell R515 with 2x quad core opteron, 16 GB ram, 2x146GB 10k in raid1 (Perc H200) for OS. For the storage I got 6x600 GB 15k disks from Seagate in raidz1. The H200 does nothing else than appear as a HBA for the raidz1-disks. Dedup is disabled, I have tried both enabled and disabled compression, and Syszfsnocacheflush enabled as well as disabled.
The ARC is usually around 10-11 GB: Min / Current / Max ARC Size 1.87 GB / 11.16 GB / 14.98 GB Cache Hits / Misses 91.60% / 8.40% Demand Data Cache Hits / Misses 72.23% / 8.68% Demand Metadata Cache Hits / Misses 21.86% / 13.70% Prefetch Data Cache Hits / Misses 2.99% / 76.79% Prefetch Metadata Cache Hits / Misses 2.92% / 0.83%
I can se that the prefetch hits is abysmal, so I have tried to disabled it, but it isn't easy to see if it is disabled or not. (i changed the /etc/system with one new line: set zfs:zfsprefetchdisable = 1).
I am running ESXi4, and the vmdk from NFS. the block size of the raidz1 volume is set to 128K.
I have done some performance testing within a win 2008 server without load with Anvil's storage utilities. The read performance for Sequential 4MB is 108 MBps (got only 1 Gbps link from the nexentastor). For 4K with a quene depth of 4 i get 12 370 IOPS, with a quene depth of 16 I get 26 241 IOPS. When I am looking in the nexenta-logs I can see that it gets most of the data from ARC and not the disks.
The writes is a different story though... 4K gets 202 IOPS, 4K quene depth of 4 gets 425 IOPS and 4K quene depth of 16 gets 1258 IOPS. This is with Syszfsnocacheflush set to YES. And less than 1 MBps
When Syszfsnocacheflush is set to NO the write performance is 4K 125 IOPS, 4K QD4 246 IOPS and 4K QD16 868 IOPS.
I do expect more than that with 6 15K drives.
Do any of you have some good recommendations?
Replies
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
I'm betting this is due to ESXi sync access on writes. To test this, set sync=disabled on the pool and retest.
RE: Poor write performance - Added by Knut Paulsen about 1 year ago
Wow, thank you!
The tests now give me: 4K 3475 IOPS / 13,5 MBps 4K QD4 12536 IOPS / 49 MBps 4K QD16 26341 IOPS / 103 MBPS
This is way better, and a lot more in the range I was looking at. I will also enable L2ARC as well now.
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
This is not recommended in general for production - it was to prove/disprove the theory. Even though zfs won't get corruption from a crash, an esxi guest's FS might have corruption depending on what was being written and when. Check the forums here for ZIL/SSD and etc...
RE: Poor write performance - Added by Knut Paulsen about 1 year ago
What is the problems with sync set to disabled? A powerloss that can make the writeoperation not to be written because they are in RAM?
I am fairly well protected with UPS and shutdowns of the servers.
I would really like to have a ZIL, but unfortunately it is not going to happen this year. Good SLC based SSD are still way to expensive.
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
Not just that. Consider that ESXi presents a virtual disk to a guest OS. The guest might have a journaling based filesystem (like ntfs or ext3 or whatever) that does certain writes in a certain order for consistency. Those writes go to the virtual disk, which ESXi then writes to the file on the NFS datastore. The ZFS appliance then lies about the writes being complete so if you crash in that window, the guest OS can in theory suffer from filesystem corruption (not just writes to a file being lost.) If you are on a UPS, you can take the chance. I do, but then my ESXi datastore is snapshotted every night and my ZFS box is on a UPS...
RE: Poor write performance - Added by Knut Paulsen about 1 year ago
Thank you for the good explanation. The Nexenta is wellbacked with two UPS-s. And I do take snapshots every day on the ZFS-appliance. I also do backup evry hour in the guests that do writes locally. So I am staying at sync to disabled.
Do you have some recommendations for well priced SSD for ZIL-usage?
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
Not really. Like you, it is not in my budget (this is basically a home server).
RE: Poor write performance - Added by David Bond about 1 year ago
To improve performance you may also want to change that 128KB block size. You will want to make it match your VM OS block size, if windows the default is 4KB. This will reduce the number of I/Os needed to write (if the block isnt in arc/l2arc) as you would need to read the 128KB block, modify the 4KB and then write it to its new location, instead of just writing the block to its new location.
RE: Poor write performance - Added by Knut Paulsen about 1 year ago
Is it possible to change the block size in a running system without being afraid of what is going to happen? Or do I have to regenerate the folder?
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
I thought that was only applicable to iSCSI? For NFS, the host (nexenta) is doing the writes...
RE: Poor write performance - Added by David Bond about 1 year ago
Its the same, you have a structured file stored on a file system, that structure is dependent upon the file system of the VM file system, 4KB, 8KB, etc blocks. These blocks are read and written from the file stored on zfs in those block sizes, so if your VM writes to its 4KB block, it writes that to the file on ZFS, which is holding that, say 200GB file, in 128KB blocks. When you write that 4KB block in the VM, you will then have to update that 128KB block on ZFS. With iSCSI you have an additional alignment to take into account, the VMFS for vmware. See nexentas best practice guide for nfs and esxi: http://info.nexenta.com/rs/nexenta/images/doc3.15000-nxs-v0.0-000004-AUsingNexentaStorNFSwithESXi_5.pdf
RE: Poor write performance - Added by David Bond about 1 year ago
http://info.nexenta.com/rs/nexenta/images/doc_3.1_5000-nxs-v0.0-000004-A_Using_NexentaStorNFS_with_ESXi_5.pdf
RE: Poor write performance - Added by David Bond about 1 year ago
Sorry this was the best practice, but that one also provides the required info and benchmarks.
Page 14:
http://info.nexenta.com/rs/nexenta/images/5000-nxs-v0.0-000002-A_nxstor_vmware_best_practices.pdf
RE: Poor write performance - Added by Linda Kateley about 1 year ago
I would also try upping the queue depth to apx 30 maybe more with the 15k drives. You know you have gone too high if you start seeing cpu spikes. The cpu will have to manage the queue if it is too high
RE: Poor write performance - Added by Knut Paulsen about 1 year ago
Linda, I presume you are talking about Syszfsvdevmaxpending? I can try to set that to 30?
And what about changing the block size? can that be done on a running system? and access the files that are stored with the 128K block size?
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
David, this is interesting. Reading that doc, it does in fact recommend 8K-16K for NFS folder record size. I wonder why the default is 128K? I guess the smaller recordsize is specific to vmware usage?
RE: Poor write performance - Added by David Bond about 1 year ago
128kb will be good for general file access, it will us smaller blocks for small files, big for big ones, improving io and throughput. But when it comes to structured files, databases, VMs, its not, as they are large files, making their block sizes 128kb. With structured files you need to align the file system blocks with the internal structure to get the best io performance.
RE: Poor write performance - Added by Linda Kateley about 1 year ago
I asked the writers of the paper how they determined the 8-16k block sizes. They found the 8k was the sweet spot for latency, but 16k was close and also had the best throughput for vmotion.
The larger 128k block would work well for vsphere writes as i believe the default block size for that is 1mb(didn't double check this).
The best thing to do for block size is run your workload and figure out your average io size as set accordingly.
RE: Poor write performance - Added by David Bond about 1 year ago
There are 2 block sizes for vmware, the vmfs uses 1,2,4 or 8MB for the unified blocks, with sub blocks broken up into 64KB prior to esxi5. ESXi5 uses 8KB blocks, ESXi also supports 1KB blocks, by storing the data in the metadata block. But that is if you are using VMFS, ie block storage. NFS has only the vmdk files, which are broken up into the file system blocks of the vm os (from what i understand).
RE: Poor write performance - Added by Linda Kateley about 1 year ago
this would make sense then why the 8k blocks are the sweet spot.
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
This is all getting very confusing. I am using NFS for my ESXi datastore. Should I be using the default 128KB, 8/16KB or does it not matter?
RE: Poor write performance - Added by Linda Kateley about 1 year ago
so for nfs use 8k, for iscsi use 128k
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
This is really confusing now. I seem to recall the defaults with nexentastor are the other way around? Or am I misremembering?
RE: Poor write performance - Added by David Bond about 1 year ago
My understanding is, you would use 8KB for both, the VMFS file block allocation size doesnt matter. Reads and writes to the VMDK, whether it is to NFS or block storage via VMFS are what the OS requests. VMware can coallesce reads and writes, so the block size read or written can be bigger than actually requested by the OS.
See: http://myvirtualcloud.net/?p=988 also read the comments by Chad Sakac (EMC), giving examples of the read/write stack from OS through vmware to the storage.
Of course this may be inaccurate, in my attempt to improve performance since we implemented virtualisation, it has taken alot of searching and reading to get information, VMware doesnt really provide concrete info on how the VMFS blocks affect read / writes, all I have read on the site is a post saying that performance isn't affected.
RE: Poor write performance - Added by Linda Kateley about 1 year ago
I also just posted a using nfs with vmware guide in the links forum. maybe that will be helpful?
RE: Poor write performance - Added by Dan Swartzendruber about 1 year ago
Yes, thanks! I am in the process of storage vmotioning all the VMs off the datastore and then back to it after changing the recordsize to 8K.