Nexentastor 4.0 and Requests for Infiniband
Added by JaeHoon Choi 11 months ago
VMware vSphere 5.0 will not support IB SRP target. (I was hear about it from Mellanox support. They said vSphere 5.0 OFED driver will support IPoIB connected mode only)
Question 1. NS4 will support embedded IB subnet manager?
Question 2. NS3.1.x support IB connected mode (MTU=65520) Will you add a IB configuration menu on NS4 GUI?
Question 3. Will NS4 support iSer protocol?
Question 4. vSphere 5 price policy isn't attrative us like SMB. I can't understand vSphere 5 vRAM entitlement. We will going to Hyper-V 3 on Windows server.
But your NS CAN'T support UPS and SMB PRICE policy.
Do you have any plan?
Replies
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Linda Kateley 11 months ago
We haven't defined specific features around Ib for 4.0 yet. I can put in your requests though. I have seen iser as basic IB support.
We also are looking at supporting all different kinds of virtualization. We have alot of investment in the open tools like openstack and cloud stack.
We do have a ups plugin available. If it doesn't work correctly, let me know and i will file a bug
RE: Nexentastor 4.0 and Requests for Infiniband - Added by FREDY . 11 months ago
Linda, can you also add a request to have IPMP config on NMV ? I heard a while ago someone saying that would be available on 3.1.2.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Elon Bjorin 11 months ago
I second Choi's post, IPoIB on Nexenta would be nice to have.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Linda Kateley 11 months ago
IPoIB looks like it is in.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 11 months ago
hi yes i got same message from Mellanox re: no SRP support for ESXi 5. Im very disappointed. Infiniband is DEAD. R.I.P. !!
I've tried IP_OIB and frankly its awful! on 20GB links x 4 can barely get 500-600MB/s no more even with quad 20GB links and multipath enabled. ive got 4 QDR switches, 2 x 2 port QDR uplinks.. and still performance is seriously bad. Even set MTU to 4096. no difference at all.
i had SRP working on Quantastor from osnexus, but its not much better on esxi 4 there are lots of errors on SRP, with ESXi4. you cant get beyond 1TB Luns. but the performance was pretty good 1700MB/s to San. But only XFS or BETA BTRFS
SO INFINIBAND R.I.P !
ESXI 5 BETA Driver - Added by bruce mckenzie 11 months ago
Ohh
I Got the Beta driver, and can confirm that it only supports IPOIB.
2 It doesn't work on esxi 5 U1 that i have!
no SRP support.
Guess a BSD or opensource project using OFED Drivers will be the only solution to SRP getting any exposure to Cloud via VMWare, Xen doesn't support SRP. Though ive heard CentOS can be installed there.
Ubuntu 12.4 does support SRP, but the cloud from ubuntu doesnt.
Oh well.. no point in using Infiniband anymore now QLogic has 16GB SAN F/C.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by FREDY . 11 months ago
Guys, no much point to use Infiniband at all really since you have 10Gb networks available. I don't think anyone here has a requirement (and a really big storage array able) to transfer files at speeds like 20 or 40 Gbp/s. Yes, 10Gb Ethernet might cost slightly more but it's not a major difference and if you have money to spend with Infiniband you have for 10Gb as well.
People tend to think that as higher their pipe is it will always peak at every click of the mouse. So what really matters is latency for certain type of applications and I don't think 10Gb suffers of latency problems or has much difference from Infiniband.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Linda Kateley 11 months ago
The key difference between ib and ethernet is latency.. which is closer to fiber latency that ethernet latency
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 11 months ago
hi I don't agree with the Ethernet vs IB. IB was created for ultra low latency communcation for super computer clusters. I will actually be lucky enough to use it to build a 200Node cluster in a year or so. (similar design to the British Weather Service Supercomputer (using IB)) 56GBQDR using RDMA for cluster messaging, which is seriously fast. Ethernet is claimed to do as well, but then why are most super computers using Infiniband as the connection fabric. I easily get 0.12 - 0.45ms in ping times over IB QDR.
After all Cisco and Qlogic each bought up a IB company themselves (Cisco bought Topspin, Qlogic im not sure, but they have now sold it on to Intel) So there are obviously some lessons learnt from IB in terms of super fast messaging. SRP/RMDA/Ethernet are just protocols that run over IB. So the more complex the protocol the more cpu cycles and less info that is moved. Ethernet on IB is slow due to the work the CPU has to do with the conversion between protocols.
I can achieve 1700MB/s SAN performance (nearly 20Gb/s) on IB using SRP/RDMA but fail when it comes to vendor support from any of the Cloud BareMetal Hypervisor providers (VMWare, Xen etc) I have had to do custom builds on CentOS and Ubuntu to get it working well. And throw thousands of $ and man hours at it.
I actually believe IB was a tec before its time. when IB came out no SAN could actually saturate it. Still very hard to do today. You'd need a SAN that can achieve 2000-5000MB/s I can now build a SAN from commodity H/W that approaches 2000MB/s - 3000MB/s. so if H/W support from vendors was there, we could get what we need. SSD performance in a Commodity Storage SAN. Couple that with ZFS, then we can move fwd with a better class of Cloud and push the prices down.
In that light I will become a member of OFED and start a Mellanox 40Gb~56Gb QDR/FDR driver project for VMWare ESXi 5.0 U1. Any one Interested in contributing?
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 11 months ago
ESXi 4 can support IPoIB DM(Datagrame Mode based on OFED 1.4.1) that support half of it's own throughput. ESXi 5 will support IPoIB CM(Connected Mode based on OFED 1.5 and above) and support full throughput like SRP protocol. (But more processor time to conversion IP to IB processing)
Also ESXi 4.x SRP initiator have a many bugs. ESXi 4.x's SRP initiator connect with each IB port not IOC.
But Windows Server 2008R2 SRP Initiator connect to SRP target with IB IOC!
NexentaStor IB SRP target have a bug, too!
If you use a dual port IB HCA on NexentaStor 1st HCA port's Physical Link is unstably link!
I think that's a Open Solaris SRP target's bugs.
I was test Windows Server 2008 R2 SRP initiator and NexentaStor 3.1.2 SRP target.
IB_SRP_Target_Hyper-V_on_WS2k8R2_4GB.jpg - It was tested on Windows Server 2008 R2 Hyper-V VM. (150.3 KB)
IB_SRP_Target_Hyper-V_on_WS2k8R2.jpg - It was tested on Windows Server 2008 R2 Hyper-V VM, too. (157.3 KB)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 11 months ago
hi JaeHoon Choi do you have a spec of the server, # of HDD etc. and any specifics on how you set up IB ? i don't see your email address in your contact details?
cheers Bruce McKenzie
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 11 months ago
My another forum simple article describs my test systems. And I have 3 2Way Xeon L5530 processors, 96GB memory server In my home labs..:)
And finally I'm also upgrade ConnectX2 40Gb IB HCAs, too.
But Mellanox launch very simple basic OFED driver for vSphere 5 driver. There isn't any VPI functions, IB native Protocols. Just only exists IPoIB that is biggest problem. Mellanox support said that vSphere OFED will support IPoIB CM mode. But they launch OFED driver that only support IPoIB DM mode. CM mode's performance is very similar SRPT's.
But they can't support now.
Today is very disappointed to me...
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 9 months ago
We also use IB SRP (unfortunately, in a production environment) and waited for a looong time Mellanox brought the esxi 5 drivers :(
There's no point in using IBoIP, as there already is a similar alternative in 10GbE. It's not only the bandwidth, but especially the latency where IB comes into place. Our storageservers are fitted with SSD drives which benefit a lot from these lower latency's. As more virtual environments are getting common AND SSD drives are dropping into the price range of regular SAS15k disks, using a networktechnology that can accomodate all this speed becomes more important.
We got IB SRP working at Nexentastor and ESXi 4.1, but are not able to upgrade until Mellanox releases drivers that support ESXi 5. The lack of support from Mellanox stands the future of Infiniband in the way, but as long as 10GbE "low latency" switches are roughly 18-20k, there's still a market I guess. Although it looks like Mellanox disagrees....
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Peter Valadez 9 months ago
bruce mckenzie wrote:
I easily get 0.12 - 0.45ms in ping times over IB QDR.
I have no experience with IB, but I do have a couple Dell 8024f 10GbE switches connected to a Nexentastor Server, and I can get ping times less than 0.1ms. Maybe this isn't the best example of a comparison of latency, but check out a quick ping test I just did:
[root@xenserver ~]# ping 172.16.10.1 PING 172.16.10.1 (172.16.10.1) 56(84) bytes of data. 64 bytes from 172.16.10.1: icmp_seq=1 ttl=255 time=0.161 ms 64 bytes from 172.16.10.1: icmp_seq=2 ttl=255 time=0.100 ms 64 bytes from 172.16.10.1: icmp_seq=3 ttl=255 time=0.096 ms 64 bytes from 172.16.10.1: icmp_seq=4 ttl=255 time=0.078 ms 64 bytes from 172.16.10.1: icmp_seq=5 ttl=255 time=0.170 ms 64 bytes from 172.16.10.1: icmp_seq=6 ttl=255 time=0.108 ms 64 bytes from 172.16.10.1: icmp_seq=7 ttl=255 time=0.108 ms 64 bytes from 172.16.10.1: icmp_seq=8 ttl=255 time=0.101 ms
And by the way, those Dell switches are 8k brand new from Dell.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by David Bond 9 months ago
Have the same switches (4 of them), do get good pings (but can always be better), but, not sure how reliable they are, 2 of them died on us after a month of use, but the other and their replacements have been running for around a year now, without problem.
The price, you can get them from dell for a lot less if you press them a little. Got all 4 of ours for a little under £10K approx $16K, with around 50 x 5m twinaxial cables.
I have been wondering about infiniband, if it does provide better latency, as we need to improve ours on our SAN, not getting the IOPS we need from a single thread at the moment. Does anyone have any actual data on infiniband with nexentastor with respect to latency?
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
Hi Ping times using IPOIB are about .019ms to .035 ms
so in that light here are my IPOIB pings esxi 4 to SRP (www.quantastor.com) SAN - about 90 x times faster than the DELL
PING 100.5.21.50 (100.5.21.50) 56(84) bytes of data.
64 bytes from 100.5.21.50: icmp_req=1 ttl=64 time=0.035 ms
64 bytes from 100.5.21.50: icmp_req=2 ttl=64 time=0.023 ms
64 bytes from 100.5.21.50: icmp_req=3 ttl=64 time=0.024 ms
64 bytes from 100.5.21.50: icmp_req=4 ttl=64 time=0.022 ms
the real speed from IB comes from the other protocols, like RDMA or SRP. IPOIB is quite slow on infiniband. HPC uses the Messaging protocols Check out this (VMWARE CTO http://cto.vmware.com/rdma-on-vsphere-status-and-future-directions/
http://labs.vmware.com/download/160/ )
see the jpg SRP (ESXi4.1) LEFT vs IPIOB ISCSI (esxi5) Right
you can see on page 7 RDMA pings very fast. not ms but in microseconds 1ms = 1000microseconds. thus average pings are in order of 100 times faster, depending on message size. given supercomputer messaging used small packets of 4k-8k then speeds are around 4-8microseconds. for SRP average size is 128K depending on block size format. so pings are very fast.
Because of the low latency, i find my iops at any block size is much higher than 8Gb F/C SAN
IO test @32k 100%read i get 1049MB/s @ 33590 iops
@64k 50%R/50%R i get 1533MB/s @24672 iops
so at average for drive i/o 128K i get 1340MB/s @10722 iops (50%R/W random io, is gets same performance)
I just happened to have 8G FC on that node as i was comparing it. 8Gb/s Fibre to same Target. ie @ 128K 50%R/50%W i get 784MB/s 8673 iops
so my conclusion is SRP 20Gb/s Infiniband is 2 x times faster than 8Gb/s F/C and 40Gb/s is 4 x faster than 8Gb/s F/C
- all tests done using single path.
check out this thread http://communities.vmware.com/message/2092828#2092828
ib_iscsi40g_test_pcie-biosupdate.jpg (109.3 KB)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 9 months ago
David,
We use Mellanox ConnectX2 IB adapters along with Qlogic 12300 IB switches. The 18 port licensed 12300 will cost you a little over $3k with an option to license the other 18 ports (because it's an 36port switch). However, Intel bough Qlogic IB division, so you won't find them anymore in their portfolio. Intel has little information yet on their website about their IB products, but you can find it under Intel Truescale Infiniband: http://www.intel.com/content/www/us/en/infiniband/truescale-edge-and-director-switches.html
The big advantage in using IB SRP is that you can rule out any TCP/IP communications with your storage. This not only reduces your latency to storage, but also offloads your CPU.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 9 months ago
Thanks bruce, I was looking for that paper before I wanted to post my message, but I couldn't find it anymore :) IPoIB is indeed quite useless to be honest. Infiniband has so much potential, but you really need SRP or iSER.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
yaaaaa Its pretty hard to find it! ;-) ;-)
i suspect a srp driver will appear but it will be long time off... :-(
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
ISER nahh every one has given up on that... i have an Voltaire iSer switch... its rubbish.
SRP and RDMA is the way to go!!
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
as you can see from my screen shots with 1.8 IPOIB Driver for ESXi 5 using ISCSI, i can barley get 8Gb/s fiber like performance...
anyway, im about to build a raid with 6 x 500MB/s SSD's Raid 5 on LSI 9265(with cachecade 2) will post the speed here! i expect 2500MB/s
Cheers :-)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
Ohh forgot to mention the IO meter ms response time for all those tests is 1.219 - 3 ms.... Pretty quick, faster than my C7000 Blades on 8Gb/s F/C switch! if i enable IPOIB ISCSI it jumps to 10-20ms
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 9 months ago
SRP and RDMA are indeed the way to go! I seriously don't get where the lack of support on these protocols come from. Is it just too new? Eventhough IB is at least 10 years among us already...
Here's another document of VMware which also compares CPU offloading in using RDMA: http://communities.vmware.com/docs/DOC-18796
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
Well most went broke. !!! :-( Qlogic's IB brand has been bought by Intel - no doubt to copy it to create PCIe-4 I/O bus! hehehe! I had the idea of using IB FDR 56GB/s to build a new CPU die with IB on die and IB on Chipset Die, to support 8 Xeons. That would really FLY! Coupled with 16 GPU's and IB to tie the bus together. So im sure Intel have thought of that. a dual port 56Gb/s bus would yield about 50GT's !!! compared to PCIe 3's 8GT's
All the others like SUN and www.Xsigo.com have been gobbled by ORACLE. (and were too expensive anyway)
frankly from my research the simple reason seems to be lack of high performance SAN's, and integration of SRP into them. Now we have SAN's using commodity H/W that can push 2500MB/s with PCIe2 and soon LSI will be pushing 3000MB/s+ with new 12Gb/s SAS on PCIe3, with SSD's even more speed coupled with CachCade2.
The IB Subnet manager isn't that hard to operate. Im a Snr Systems Engineer (Big Blue), and it took me a few months playing with it to see its potential!
SO IB was 10 years before its time.... that's my 2c worth!
my current cloud (opensource cloud.com) can push 1800MB/s on all blades! VROOM! Just about to push it out as 2nd IB Cloud in AU. :-) Cheaper, FASTER, BETTER! www.v365.com.au coming soon.
cheers!
RE: Nexentastor 4.0 and Requests for Infiniband - Added by FREDY . 9 months ago
Guys, serious does 0.0xxx ms make any real difference or are them just numbers ? With normal Gigabit Ethernet I get around 0.2, 0.3 sometimes 0.1xx ms. How 0.0xxx would differ from this ? From what I see a few ms might make difference for storage, but not microseconds. If it's a difference of 10, 15 ms yeah I would agree it's significant, but 0.1 or even 0.2 ms, who cares ? That is just the ping time, you obviously need to add up the disk seek time and other stuff not accounted here which is common to any kind of technology.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
hi Freddy yes its a small time/measurement. But with things like Buss messaging it makes all the difference. Ethernet pings aren't really what concerns us.
So try it. You'll see a large impact on non IP based traffic. It's no good/average for ISCSI. well its ok if 8Gb/s is all u want. too much latency as it goes RDMA>IPOIB>ISCSI>IPOIB>RDMA
The new Mellanox congestion control works well for Ethernet traffic, so those woes are gone. Prices have come down. I can still build a SRP/SAN & IP network for 1/2 10gbe Ethernet costs for C7000 blades. And I've not SEEN ANY ISCSI 10Gbe SANS work faster than 8-8.5Gb/s you may as well get F/C as thats about the same price as 10Gbe switches.
As i use HP BL2x220 G7 dual Mobo Blades, thus only 1 Mezzanine card avail. so IB is perfect for that. Much higher density per Chassis. 32 Servers.
and yes i'm seeing a difference over Ethernet, as do most others with IB do. Using IPOIB i get 5-8Gb/s (VM to VM) with only 5-10% cpu time/VM or physical Server. which is MASSIVE!!, less CPU Time than say 10Gb/s Ethernet @50%. Especially important for Cloud/shared CPU resources.
Most Ethernet chips from Cisco are actually based on 10Gb IB anyway as they bought Topspin a few years back, and Nexus was what came out of the lab. Which is why Mellanox can so easily switch over to Ethernet.
As for disk seek times i have the new 3-4ms 1TB WD VelociRaptors which easily get 200MB/s in io tests, faster than most SAS drives, and some SSD' i may add. And 12 in a Raid 5 or 6 Array yields massive speed using LSI 9265-8i
anyway i can disagree if i want to cause i see it working much faster than Ethernet. Spew! LOL. Who cares! this aint a competition!
RE: Nexentastor 4.0 and Requests for Infiniband - Added by David Bond 9 months ago
FREDY . wrote:
Guys, serious does 0.0xxx ms make any real difference or are them just numbers ? With normal Gigabit Ethernet I get around 0.2, 0.3 sometimes 0.1xx ms. How 0.0xxx would differ from this ? From what I see a few ms might make difference for storage, but not microseconds. If it's a difference of 10, 15 ms yeah I would agree it's significant, but 0.1 or even 0.2 ms, who cares ? That is just the ping time, you obviously need to add up the disk seek time and other stuff not accounted here which is common to any kind of technology.
Yep it would be fine for spinning disks (only a few IOPS lost), but when you have drives with less than 26 microsecond latency (Zeus RAM), it makes a difference (1000s of IOPS lost).
with 0 latency, a zeus ram, in theory, should be able to do around 38,000 IOPS (quick test on the head a while back it got around 28000), single thread, sync write. If you add the latency of ethernet, say 0.1ms, you now have to add 0.2ms (round trip) to each write request, this means max 5000IOPS single threaded writes, not taking into account the latency of the disc, but no real need as its far lower than what the disc can actually do.
Now if infiniband can provide 20microsecond latency, then you can get up to 25000IOPS, if we take into account now the latency of the disc, its around 16000IOPS.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by FREDY . 9 months ago
I would say that 10Gig Ethernet is the way to go for majority of the cases, not Infiniband. Infiniband is more for certain usecases, but general 10 Gig Ethernet can do pretty satisfactory without all this hassle of Infiniband driver support that we are discussing here.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 9 months ago
Fredy,
SSD storage is getting more common as prices fall rapidly. A Crucial M4 SSD is almost the same price as any SAS 15k model with around the same capacity. Most cloudproviders already use SSD caching, but you see more and more cloudproviders offering virtual private servers based on SSD storage. 10GbE is not the way to go when you want to take full advantage of SSD capabilities. Not taken into account SSD is still a bit immature technology and it wouldn't surprise me consumer disks will have an average latency of 0,01ms or lower within the next one or two years.
I agree 10GbE is probably right now enough for most cases, but Infiniband is growing: http://news.yahoo.com/analyst-report-confirms-infiniband-growth-high-performance-computing-160037088.html
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
Hmm, Perhaps Ethernet works, but not as you expand horizontally. Or even very well horizontally. Infiniband core switches have 10 times more capacity than nexus.
I've many instances where clients just throw in 10gbe into complex clouds and many server blades with expectations it would cope but doesn't. Ethernet latency quickly compounds and gets as bad as 100-1200ms on vmware with many guests and hosts. Low latency high speed fabric better than 8gb fiber channel is essential in scaling new cloud infrastructures.
There are too many clouds where I hear many complaints of ethernet based iscsi sans delivering miserable performance. As low as 2mb/s per VM guest. That's simply bad.
What's the point delivering say 100mb to a web server only to have its shared storage failing because some one is vmotioning their storage on the storage vlan and hogging all the bandwidth.
Also why do you think oracle have so many infiniband SQL server solutions...
Hers something to think about.. on an Ethernet cluster if I begin moving files on 10 guests, then try to move more data on more hosts, they choke. Do the same on infiniband network and the bandwidth is equally shared and hosts do.not choke, but the bandwidth is equally shared. Ie 1800mb/s each host easily gets 180mbs. If u keep copying on more hosts it keeps going. That simply doesn't happen on Ethernet. And yes I've actually performed those tests. I can motion a 40GB partition in less than 60 seconds on infiband.
So there is a lot more to say Ethernet can t hope as its evident as u scale up it falls flat, so u then spend more than a infiniband network for less performance.
Also clients expect 10gbe will be 10 times faster but it is not.
So infiniband in my opinion is very important for future growth, especially highly dense platforms such as the ones I design.
Especially my clients at IBM.
As a senior engineer its my role to know what is better than Ethernet, or fibre and I do think infiniband has its place in hpc clouds etc. Using infiniband I can easily scale a cluster to hundreds of nodes in one fabric.
Yes there are issues with drivers, but that's mellanox problem to address, vmware seem to be trying to sort out a new driver, and with that report it looks good for infiband. Its very frustrating, cause it works well. Too many naysayers in the world.
If we can get some infiniband srp driver in the mean time then sweet. Vmware are just realising the value of infiniband so guess we will have to wait :( the value to motion alone is worth it.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by bruce mckenzie 9 months ago
Hmm, Perhaps Ethernet works, but not as you expand horizontally. Or even very well vertically. Infiniband core switches have 10 times more capacity than nexus.
I've many instances where clients just throw in 10gbe into complex clouds and many server blades with expectations it would cope but doesn't. Ethernet latency quickly compounds and gets as bad as 100-1200ms on vmware with many guests and hosts. Low latency high speed fabric better than 8gb fiber channel is essential in scaling new cloud infrastructures.
There are too many clouds where I hear many complaints of ethernet based iscsi sans delivering miserable performance. As low as 2mb/s per VM guest. That's simply bad.
What's the point delivering say 100mb to a web server only to have its shared storage failing because some one is vmotioning their storage on the storage vlan and hogging all the bandwidth.
Also why do you think oracle have so many infiniband SQL server solutions...
Hers something to think about.. on an Ethernet cluster if I begin moving files on 10 guests, then try to move more data on more hosts, they choke. Do the same on infiniband network and the bandwidth is equally shared and hosts do.not choke, but the bandwidth is equally shared. Ie 1800mb/s each host easily gets 180mbs. If u keep copying on more hosts it keeps going. That simply doesn't happen on Ethernet. And yes I've actually performed those tests. I can motion a 40GB partition in less than 60 seconds on infiband.
So there is a lot more to say Ethernet can t hope as its evident as u scale up it falls flat, so u then spend more than a infiniband network for less performance.
Also clients expect 10gbe will be 10 times faster but it is not.
So infiniband in my opinion is very important for future growth, especially highly dense platforms such as the ones I design.
Especially my clients at IBM.
As a senior engineer its my role to know what is better than Ethernet, or fibre and I do think infiniband has its place in hpc clouds etc. Using infiniband I can easily scale a cluster to hundreds of nodes in one fabric.
Yes there are issues with drivers, but that's mellanox problem to address, vmware seem to be trying to sort out a new driver, and with that report it looks good for infiband. Its very frustrating, cause it works well. Too many naysayers in the world.
If we can get some infiniband srp driver in the mean time then sweet. Vmware are just realising the value of infiniband so guess we will have to wait :( the value to motion alone is worth it.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Peter Valadez 9 months ago
Bruce, I'm glad you expanded on this and showed that infiniband latency is actually less than 10GbE. Infiniband does sound like it would be worthwhile if you're able to get the driver support issues figured out. It especially makes sense in a cloud hosting setup if infiniband eliminates cpu usage as much as you are saying- for some stupid reason Xenserver loves to crank up the cpu cores when doing storage transfers on ethernet. I thought this was just Xenserver, but maybe it's inherent to the amount of cpu overhead used by ethernet. And I know this thread is for VMware, but for what it's worth it doesn't look like Xenserver supports infiniband at all.
Still, I think 10GbE should work fine for us right now as we won't need a single thread using that many iops. Our IO should be spread across many VM's and hosts, and we've stacked our switches and created aggregate links in Nexentastor so the heads should technically have 20Gbps of bandwidth available. Also, like David we shopped around and actually only paid 3k for dell 8024f's.
Thanks for the tips- we will definitely keep infiniband in mind for future upgrades. It's nice to know that at least Nexentastor 4 will support IPoIB!
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Elon Bjorin 9 months ago
Peter Valadez wrote:
Bruce, I'm glad you expanded on this and showed that infiniband latency is actually less than 10GbE. Infiniband does sound like it would be worthwhile if you're able to get the driver support issues figured out. It especially makes sense in a cloud hosting setup if infiniband eliminates cpu usage as much as you are saying- for some stupid reason Xenserver loves to crank up the cpu cores when doing storage transfers on ethernet. I thought this was just Xenserver, but maybe it's inherent to the amount of cpu overhead used by ethernet. And I know this thread is for VMware, but for what it's worth it doesn't look like Xenserver supports infiniband at all.
Still, I think 10GbE should work fine for us right now as we won't need a single thread using that many iops. Our IO should be spread across many VM's and hosts, and we've stacked our switches and created aggregate links in Nexentastor so the heads should technically have 20Gbps of bandwidth available. Also, like David we shopped around and actually only paid 3k for dell 8024f's.
Thanks for the tips- we will definitely keep infiniband in mind for future upgrades. It's nice to know that at least Nexentastor 4 will support IPoIB!
I've run IPOIB on Xenserver since 3.0, it didn't want to speak iSER or SRP but IPOIB worked nicely. Just download the OFED drivers and compile on Xenserver DDK (You don't have to be a superhacker, OFED is nice enough to include a gui for building your driver). I'm quite saddened with the lack of support for infiniband on vsphere5. I'm currently building a new 32 host cluster and have to base it on 10gbe instead of the 20gbit SRP I have running for the esxi4 servers. I hope Mellanox does release a better driver, but they are kind of sketchy when it comes to releasing drivers and supporting older hardware.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Peter Valadez 9 months ago
Thanks Elon,
In your experience with xenserver, do you see a lower cpu utilization when using infiniband?
I've recently started to look at sas switching as an option for shared storage, and it's performance and price points look very attractive. Check this presentation by SNIA out: http://www.snia.org/sites/default/education/tutorials/2011/spring/storman/GibbonsTerryShareableStorageWithSwitched_SASv2.pdf
**EDIT**
This dang forum editor loves to mess up links I post by taking out the underscores- here's the correct link:
http://www.snia.org/sites/default/education/tutorials/2011/spring/storman/GibbonsTerry_Shareable_Storage_With_Switched_SASv2.pdf
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 8 months ago
Pretty good news I guess. I've just spoken to a Mellanox rep at VMworld who told me support for SRP use will be launched later this year. Nexenta is also at VMworld, they told us Nexenta is officially going to support Infiniband SRP and even maybe iSER in Nexenta 4, probably being released in the next coming months.
It seems like our prayers have been heard :D I'm really excited about Mellanox and Nexenta finally making some great steps in improving decent support for IB usage!
RE: Nexentastor 4.0 and Requests for Infiniband - Added by FREDY . 8 months ago
I wouldn't believe on both statements until I see it there and supported, given their historic. Very easy for a Sales guy to say "It will be there 'later this year' " just to get an smile from a potential customer's face.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by simon townsend 8 months ago
There's apparently a solid beta from Mellanox running SRP ... digging for more info ... would be good to have formal confirmation from Nexenta ...
RE: Nexentastor 4.0 and Requests for Infiniband - Added by simon townsend 8 months ago
Just to be clear, a solid beta for vmware 5.x
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Derek Glover 8 months ago
I don't have dates available, but it is being worked on.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Matt Breitbach 8 months ago
It will be interesting to see if RDMA makes it in. Server 2012 sounds like it's got great support for RDMA, would love to see RDMA support working in Nexenta for connections to Win2012 clients.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 8 months ago
Simon, Mellanox rep @ VMworld stated that there is infact already a beta for the SRP driver, but it wasn't stable (at least, that's what they told me). However, there should be a stable beta pretty soon and a GA release aiming for the end of this year. Maybe the stable beta is already released, don't know the details of that and VMworld has already come to and end :(
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Steve Radich 8 months ago
We've been running Infiniband about a year and no issue with NexentaCore / Illumian systems but we see VERY poor interrupt usage causing the slow performance. Seems the cards use FAR too many interrupts.
As to price - Infiniband used 10gb equipment is almost given away on ebay all day, yet 10gb Ethernet is still pricey - The faster Infiniband's of course are worth getting but the argument of 10gb is fast enough doesn't make the costs comparable - If you say 10gb is fast enough then Infiniband has a MAJOR pricing advantage.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Jon Schillinger 6 months ago
I just received this response from Mellanox:
Regarding your questions: The next release of InfiniBand OFED Driver for VMware ESX Server 5.1 will include SRP support. The next release of InfiniBand OFED Driver for VMware ESX Server 5.1 is currently scheduled to December 2012. Please, note that this schedule may change.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by eMiz0r . 4 months ago
The new Mellanox SRP driver for esxi 5.1 will be released somewhere next week told a rep.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 4 months ago
eMiz0r . wrote:
The new Mellanox SRP driver for esxi 5.1 will be released somewhere next week told a rep.
Good to know and hopefully true - and a stable release. Can't wait to try it. Now if only Mellanox and Nexenta would support NFS+rdma ;)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Jon Schillinger 3 months ago
Time keeps on slipping, slipping, slipping into the future...
Regarding your question: Mellanox OFED Driver for VMware ESXi Server 5.1 release is currently scheduled to the end of February, 2013. Please note that this schedule may change.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 3 months ago
Jon Schillinger wrote:
Time keeps on slipping, slipping, slipping into the future...
Regarding your question: Mellanox OFED Driver for VMware ESXi Server 5.1 release is currently scheduled to the end of February, 2013. Please note that this schedule may change.
sigh that kind of puts a damper on my plans to build a new storage array.
Ah well I gave Hyper-V a look. Getting drivers for Windows is that much easier. But when I compare hyper-v to vmware ... vmware wins hands down. Much more usable and easier to work with. So I'll wait a bit more for a usable driver from Mellanox.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 3 months ago
Mellanox is ready to launch SRP driver now.
New Mellanox OFED driver 1.8.1.0 is ready and readme was published.
But, driver download was blocked.
I think that they unblock the new driver tomorrow...
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 3 months ago
I'm installed Mellanox OFED 1.8.1.0 on my test lab, now! I'm attatched picture that describes a result on 40Gb QDR HCA and switch environments.
And~
I have a final request about Infiniband solution on Nexenta.
Your Nexenta support IPoIB protocol.
But can't support IB partition.
If I make a 4 Partition and have 2port QDR HCA then your Nexent NMS show a 8 ibd interface.
If you want support IPoIB also then you must have a NMS GUI menu that can configure IB partition and IPoIB interface name. (dladm show-ib, show-phys, etc...)
Do you have any plan?
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Bee Gee 2 months ago
Bump for info on this.
Mellanox is all over the vmware HCL: http://www.vmware.com/resources/compatibility/search.php?deviceCategory=io&partner=55&releases=171&page=1&display_interval=20&sortColumn=Partner&sortOrder=Asc
JaeHoon, the numbers you posted from CrystalDiskMark are amazing! Is that on 10GbE Infiniband? Your other post mentions that you are currently using 10GbE infiniband. What does your environment look like now because the other post is a year+ old now.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Anthony Glidic 2 months ago
Hello,
can you tell me wich infiniband card did you use with your nexenta?
I want to make some test too but they have nothing on the HCL for the moment.
Thanks
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 2 months ago
Mellanox ConnectX-2 should work. [[http://www.mellanox.com/page/productsdyn?productfamily=61&mtag=connectx2vpi]] ConnectX-3 does not.
At least that is what they told me when I asked.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Anthony Glidic 2 months ago
Thanks, do you know about the connect-X? Because we can found some card really cheap on ebay
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 2 months ago
At least the cards are older and supported under Solaris 10, so I go out on a limb and say they are supported under Nexenta as well by the hermon driver.
I did read some comments about older IB cards need to have onboard memory to work under Solaris-based OSes, but I cannot remember if these needed as well.
For a definitive answer you should ask Nexenta, try it yourself or wait until they revise the HCL.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 2 months ago
Well, I did some testing with Nexenta 3.1.3.5 and infiniband. Cards: Mellanox ConnectX-2 ESXi 5.1 with Mellanox Driver 1.8.1.0 which supports SRP.
First impression: you have to manually activate SRP on the command line (expert mode). No biggie, I have done far worse ;) Second impression: you have to create normal iscsi targets on Nexenta, it just works on the ESXi side of things, pretty easy. Third impression: damn fast compared to IPoIB or 10GE.
Too bad that Nexenta crashes in seconds if you have VAAI enabled on ESXi. After disabling it, it worked fine (so far).
Some benchmarks with a fresh windows server 2008r2 install, iometer with settings from http://vmktree.org/iometer/
SERVER TYPE: Dell R710 CPU TYPE / NUMBER: 2*Xeon E5620 2.4GHz HOST TYPE: VMware ESXi 5.1.0 STORAGE TYPE / DISK NUMBER / RAID LEVEL: whitebox Nexenta system, 22*1TB Toshiba NL-SAS, mirrored-stripe zpool config, connected with Infiniband QDR (Mellanox ConnectX-2)
I did a re-test and disabled compression and chose a testsize greater than arc for more realistic numbers:
| TEST NAME | Avg Resp. Time ms | Avg IOs/sec | Avg MB/sec | % cpu load |
| Max Throughput-100%Read | 1.11 | 49724 | 1553 | 4% |
| RealLife-60%Rand-65%Read | 36.20 | 1613 | 12 | 3% |
| Max Throughput-50%Read | 1.04 | 51132 | 1597 | 3% |
| Random-8k-70%Read | 40.30 | 1441 | 11 | 3% |
The ARC is just too good it seems. Testsize was 768GB, and Nexenta only has 200GB for ARC (and no L2ARC).
And some tests for maximum throughput, to get near the max of the infiniband connection:
| TEST NAME | Avg Resp. Time ms | Avg IOs/sec | Avg MB/sec | % cpu load |
| 128k 100% random read | 3.45 | 17098 | 2137 | 34% |
| 128k 100% random write | 22.21 | 2654 | 331 | 5% |
| 128k 100% sequential read | 3.33 | 17731 | 2216 | 34% |
| 128k 100% sequential write | 2.65 | 22040 | 2755 | 3% |
| 1M 100% random read | 26.62 | 2249 | 2249 | 2% |
| 1M 100% random write | 150.84 | 399 | 399 | 3% |
| 1M 100% sequential read | 25.37 | 2350 | 2350 | 4% |
| 1M 100% sequential write | 198.17 | 302 | 302 | 5% |
Impressive numbers - at least for me. My conclusion is pretty simple: infiniband is a real contender for storage connectivity, and in my case is a lot cheaper than 10gig ethernet or fibre channel.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Carl Brunning 2 months ago
nice to hear you got Mellanox working on nexenta 3.1.3.5 for use slow people could you tell us how you did it
i like to have a go but not sure on the way of doing that
thanks
RE: Nexentastor 4.0 and Requests for Infiniband - Added by Lars Pisanec 2 months ago
ssh to your nexenta as root, then:
option expert_mode=1
!svcadm enable -r ibsrp/target
done.
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 2 months ago
My all 4 system configuration is below.
Intel Xeon L5520 * 2ea 96GB DDR3 Reg. ECC Memory SuperMicro X8DTN+-FLR IBM M1015 SAS2 HBA with LSI 9211-8i IT firmware (will be a LSI Original SAS2 HBA) Mellanox ConenctX MHQH19-XTC QDR HCA with 2.9.100 firmware with FlexBoot 3.4.000 SuperMicro SC826E2 (Just only for test) with 2 channel SAS Expander (will be a E16 expander in future) 7.2kRPM 1TB SATA HDD * 10ea with RAID1+0 ZFS configuration Intel 160GB SSD * 2ea for L2ARC
Tyco QDR Copper cable Mellanox (old voltaire) 4036 36port QDR switch with SM
I'm also found NexentaStor's VAAI has a critical bug.
for example, when VAAI full clone running then storage disks almost locking and other VMs also very low performance If VAAI full clone was complete then recover original infiniband SRP target performance (latancy & throughput performance)
I was dropped NexentaStor in my personal lab. I'm using now OmniOS.
You can find Illumos based ZFS OS VAAI support status via next command on vSphere console.
esxcli storage core device vaai status get
naa.600144f0b18cc100000051455c080009 VAAI Plugin Name: ATS Status: unsupported Clone Status: unsupported Zero Status: supported
Delete Status: supported
NexentaStor's VAAI full clone has performance locking bug!
I need all VAAI features. But Almost Illumos ZFS OS can't support. While I'm happy now with infiniband configuration.
VAAI ATS Without VAAI ATS feature vSphere only support old SCSI2 Lun locking. But QDR Infiniband have very low latancy and full offloading feature. Therefore doesn't exists any problem to me! (and Oracle's ZFS Appliance also only support Zero status and Delete Status, too!)
VAAI full clone VAAI full clone was completed in short time with Infiniband QDR (real 32Gb/s) performance and latancy! (If VAAI full clone can support in Illumos ZFS OS VM template deployment was complete in more short time!)
VAAI Zero status If I test MSCS on vSphere environment VAAI Zero status feature is very good to make block zeroing VMDK. Illumos based ZFS OS can support now!
VAAI Delelete status You can reclaim unused space in thinprovision volume via next command on vSphere console.
vmkfstools –y <% of free space to unmap>
If you want reclaim 60% of your sparse space just type below in vSphere console ~ #vmkfstools –y 60 /vmfs/volumes/your storage name
That's all~:)
PS. I'm also test NexentaStor 4.0 beta (milestone) That's a very unstable....:( If you enable IBStorage than NMS was crash and you can log for support.
Below attachments are my final ZFS QDR infiniband Storage test
Hypervisor - vSphere 5.1 evaluation ZFS Stoarge - OmniOS ZFS latest stable Interface - Mellanox ConnectX MHQH19-XTC QDR HCA Swtich - Mellanox 4036 36port QDR switch with SM
Maximum read only throughput on my environment was 4.2GB/s~! ...:)
Conclusion I'm very disappointed some bungs on NexentaStor. Therefore I was dropped it in my personal lab. I was thought about next generation storage platform our company. But NexentaStor was not~!
NexentaStore commercial edition, too..:(
QDR_final_test.JPG (127.3 KB)
QDR-iometer-4k_Read.JPG (137.7 KB)
QDR-iometer-4k_Write.JPG (143.2 KB)
QDR-iometer-32k_Read.JPG (130.9 KB)
QDR-iometer-32k_Write.JPG (130.7 KB)
QDR_HCA_-_Over_the_16Gb_FC-HBA..JPG (106.6 KB)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 2 months ago
Bee Gee wrote:
Bump for info on this.
Mellanox is all over the vmware HCL: http://www.vmware.com/resources/compatibility/search.php?deviceCategory=io&partner=55&releases=171&page=1&display_interval=20&sortColumn=Partner&sortOrder=Asc
JaeHoon, the numbers you posted from CrystalDiskMark are amazing! Is that on 10GbE Infiniband? Your other post mentions that you are currently using 10GbE infiniband. What does your environment look like now because the other post is a year+ old now.
No. That's a SRP Target...:)
RE: Nexentastor 4.0 and Requests for Infiniband - Added by JaeHoon Choi 2 months ago
Anthony Glidic wrote:
Hello,
can you tell me wich infiniband card did you use with your nexenta?
I want to make some test too but they have nothing on the HCL for the moment.
Thanks
NexentaStor and Illumos compatible HCA are belows.
InfiniHost III Ex with local memory only (Very old and firmware update doesn't exists then unstable. vSphere 4.x support only!) ConnectX and above (X-2, X-3)
Have a good time...:)