NFS Read performance degredation

Added by Denis Besic about 1 year ago

Hi guys

I'm hoping that you can help me. We've got an issue with NFS Read performance. What we can see is that if we hammer the server with high IO load the NFS read performance degrades to 3-7mb/s while our write performance isn't affected and is always 280-300mb/s. While the reads are performing we are able to get constantly in the order of 950mb/s of cached reads. When the performance degrades the disks are not utilized it is still reading from cache.

I just can't figure out what the issue is. We're running this box in a testing phase right now but should go into production within 2 weeks if the issue is resolved. We can "clear" the issue by stop/starting the NFS Server and after that the performance is back for a while. We are using ESXi 5's to access the volume through NFS. All the NFS settings are default. We haven't yet touched them.

Attached is an illustration of what happens during a cloning of a VM localy from and to the "view-vol" as the problem arises and as I clear the problem by issuing a NFS Server restart.

We are running a single volume that is configured as following and the Nexenta is v 3.1.2 updated.

3 mirror groups of 2 x 600GB storage for a total of 1,8TB of storage 3 mirror groups of 2 x 50GB IBM Enterprise class SSDs for LOG 2 x 50GB IBM Enterprise class SSDs for L2ARC cache.

The appliance has 64GB of RAM.

Regards,

Denis

nexenta.jpg (99.8 KB)


Replies

RE: NFS Read performance degredation - Added by Jeff Gibson about 1 year ago

How are you testing the NFS (just vmotions?)? Does the time to "degrade" appear to be the same after restarting the appliance? Can you track the amount of data read (to try to correlate it to a memory/disk limit) before it degrades. Have you changed the NFS version in Nexenta (I don't use NFS, but I've read that ESXi uses version 3 and might be getting confused on the v4 default of Nexenta)? Are you using jumbo frames (might try without if you are)?

SSH into the esxi host and use esxtop to look at the v, d, and u screens to see if you can spot any high latency issues that would point back to a network config.

RE: NFS Read performance degredation - Added by Denis Besic about 1 year ago

Jeff Gibson wrote:

How are you testing the NFS (just vmotions?)? Does the time to "degrade" appear to be the same after restarting the appliance? Can you track the amount of data read (to try to correlate it to a memory/disk limit) before it degrades. Have you changed the NFS version in Nexenta (I don't use NFS, but I've read that ESXi uses version 3 and might be getting confused on the v4 default of Nexenta)? Are you using jumbo frames (might try without if you are)?

SSH into the esxi host and use esxtop to look at the v, d, and u screens to see if you can spot any high latency issues that would point back to a network config.

Hi Jeff

I'm testing using both the vmware hypervisor directly and cloning also through the vms themselves. The time to degrade is the same. As soon as we push high IO the appliance dies after about 5 mins. Yes we've tracked the data read and its somewhere around 20gb well within the ARC cache. I've tried both V3 and V4 on the Nexenta without any noticable improvements also tried with and without jumbo frames. I'm speculating that it's something with the NFSD process that i'm just unaware of.

I'll try esxtop i don't think it's an network issue but just to exclude that option. We're running on an 10G nexus switch right now with no other load.

RE: NFS Read performance degredation - Added by Jeff Gibson about 1 year ago

You might double check using iperf to test the network BW between nexenta and ESXi. I'd run the test twice as long as it takes to cause the degraded status just to be sure.

When you're doing you're test watching esxtop, change to the "u" screen, then expand the device with the "e" command and type in the device name (I believe it'll start with {NFS}) and then watch the DAVG/cmd column to see what the average command latency is (mine was spiking to >300ms when I was having MTU issues)

RE: NFS Read performance degredation - Added by Denis Besic about 1 year ago

I just ran the iperf tests.

It appears the network is fine(and stable) no matter how many iterations I throw at it. No noticable spikes on DAVG/cmd either. I can't provoke the error with iperf - only using NFS.

I've also tested CIFS and it appears unaffected by this. Even when NFS repports reads of 3-7mb/s the CIFS service can deliver the expected read performance.

I'm not sure where to look but I believe this "problem" is isolated to the NFS Server service on the Nexenta however i'm not sure how to debug it on the appliance.

RE: NFS Read performance degredation - Added by Denis Besic about 1 year ago

And thanks for your help and tips in trying to solve this issue Jeff.

RE: NFS Read performance degredation - Added by Jeff Gibson about 1 year ago

You're welcome, but I think you're about to reach the end of what I can offer since I don't use NFS. One last bit of clarification, when you ran the iperf test, did you do bidirectional or trade? Or did you only test one direction?

RE: NFS Read performance degredation - Added by Denis Besic about 1 year ago

It would be one-direction i think. I only ran it against the nexenta appliance with an iteration of 100.

RE: NFS Read performance degredation - Added by Jeff Gibson about 1 year ago

Try running it in dual mode (sending traffic in both directions at once) or switch which is the server for another test. I found I had a problem in only one direction but affected performance when I tried to use the system.