Infiniband or Ethernet 10gb/s for GPU cloud?


#1

I am running a studio and we are working on a small GPU renderfarm of 10*GTX 780. Last week I tried a demo of GPUbox (http://www.renegatt.com/products.php) and this solutions suits us better than our present renderfarm but our network (1 Gb/s) seems to be not enough to deal with 10 gpus…

I was thinking about getting 10gb Ethernet but I saw that developers of GPUBox are recommending infiniband. Does anyone have an experience with setting up infiniband network? Would it be very expensive to connect 5 PCs with infiniband? I can’t find much information about it…

I’d greatly appreciate any advice!

PS
And since I finally signed up and it is my first post here I would like to say ‘hi’ :slight_smile:


#2

Ethernet is a general purpose networking stack that relies on the operating system to handle most of the work. This is good for application developers who don’t want to have to reinvent the wheel but it also introduces more latency and uses more resources which can become a show stopper for high performance computing applications that need to share a lot of data really quickly.

InfiniBand can be tailored at the application level including direct memory access of remote systems which completely bypasses the operating system networking stack. This dramatically reduces the latency. In terms of throughput the current generation of InfiniBand (56Gbps) is faster than 10GbE but the lower latency is the biggie.

A 10GbE adapter will run about $400, the FDR InfiniBand adapter will be about double that and up. A 12 port 10GbE switch will run about $1,500 and a 12 port FDR InfiniBand switch will run about $5,000 (plus more expensive cabling). On copper 10GbE is good up to 100 meters and FDR InfiniBand is good up to 30 meters so that might be a consideration if the artists will be a long way from the machines.

I’d ask the vendor for a direct comparison between the two networks relating to their product. How much of a difference does it make in real world production and then weigh that versus the cost difference.


#3

Thank you so much, that is very helpful!

The distance does not really matter, because we are going to work on virtual machines. I emailed the guys from Renegatt and they told me that they have native InfiniBand support and it can make a difference.

Price of InfiniBand is indeed pretty high, but I’m thinking about getting a second-hand switch and adapters and buy only new cabling. Btw, what be wrong with this adapter for $300? 40 Gb/s would be more than enough for me, 56 Gb/s is not really that necessary.


#4

Before considering inifiniband, which is great but also a considerable investment and not exactly trivial when it comes to logistics, I would look into tiering storage and managing data movement and see if what you are transferring and when.

A constant trickle of traffic to keep storage with high proximity to the rendering metal ready to go tends to be faster than moving things over just in time, and for that 10Gb is plenty for just five artists.

How much data you transfer, where you keep it, and how frequently it’s used and needs to be returned is more important to laying out networking than sheer performance of the channel alone.

Networking is like moving stuff around between warehouses, having stuff moved at night to a shop close to the user is often better than having a central warehouse 500Km away from everybody and delivering groceries piecewise with a Ferrari.


#5

I’m not sure that InfiniBand can be used in such a way with virtual machines. That may complicate or obstruct direct memory access of remote hosts (defeating the point of InfiniBand). By virtual machines do you mean that the workstation is elsewhere and the user has a thin client or quite literally the users share a machine and each is using a virtual machine? It might be possible to use InfiniBand with virtual machines but I’d investigate it thoroughly before making a purchase.

I wouldn’t rely on second hand switches and adapters. They also have only one adapter at that price. What’s the point of spending all the money on these resources if a single point of failure without a warranty or support can take it all down, like a failed power supply on a switch.


#6

I want to use virtual machines set up on my server connected with the PCs with GPUs over IB (or 10GbE, but I’m almost sure I’m going the infiniband way). I found a short chapter about “InfiniBand in VM” in the documentation of the software I want to use (GPUBox) and they say that infiniband can be used in virtual machines thanks to “SR-IOV”:

The latest hardware and software supports the InfiniBand virtualization. (…)
In order to use the SR-IOV, the hardware and software must support VT-d. Additionally, BIOS and the device driver must support SR-IOV.

As I see every adapter I am considering supports SR-IOV so there should not be problem about that.

Prices of second-hand switches and adapters are very tempting but after all I think you might be right about that. However, I’m going to buy something less expensive than 56 Gb/s. Even if I manage to expand my “GPU cloud” accordingly to my plans in the next 1-2 years, I think that ~40Gb/s will do more than fine in my case.