Iray Performance Benchmarks and Hardware Setup

Benchmark Results

Want to know what hardware performs best with RealityServer and Iray rendering? Below you can see a comparison of various NVIDIA GPU models and their relative Iray rendering performance. We will continue to add additional benchmark results here as we test more hardware. Larger numbers are better in these results and the numbers themselves represent megapixel iterations per second. There are a lot of Helpful Hints further down on getting the most out of your hardware with Iray, but if you just want the numbers here they are…

 

Multiple GPU Scaling Efficiency

Many people wonder how effective multiple GPUs are with Iray. In short, they are extremely effective. It is commonly thought that multiple GPUs within a single machine do not provide a significant benefit, in part due to comments made relating to other technologies, such as games where SLI is used. Iray does not need or use SLI at all. Instead it simply uses all available CUDA devices in the system unless told otherwise. As you can see from the chart below, scaling efficiency with multiple GPUs is incredibly good.

Even with 8 GPUs in a single machine scaling efficiency is still at around 95%. Our tests indicate this is largely independent of the cards used provided they are all the same model. Iray will support different types of GPUs in one system and will try and balance the load between the cards as well as possible. The ability to add many GPUs to a system allows GPU performance to scale much more cost effectively than CPU performance (while 8 CPU machines are available, they are significantly more expensive than 8 GPU machines, and even then you would need at least 16 of latest CPUs to reach the performance of 8 of the fastest GPUs).

Amazon EC2

Many customers ask us about performance on Amazon EC2, both on GPU instances and CPU instances. To give you a better idea of what to expect when running Iray on Amazon EC2 we have also run our benchmarks on various Instance Types. As you can see from the results below, the GPU instances provide significantly higher performance than any CPU only instance.

Additionally, since Amazon provides pricing information for its instances we are more easily able to look at price performance ratios when running on Amazon EC2. Here are the price performance ratios for the instance types tested above. Larger numbers are better.

So, not only do GPU instances provide better absolute performance, they are also significantly better value for money than CPU instances. As such the only real reason to utilise CPU instances is situations where the Amazon GPUs do not have sufficient memory for your scenes.

Testing Methodology

All benchmarks have been performed under Linux (usually CentOS 6.4) with the latest available NVIDIA drivers (as of writing 319.60). Some tests were performed on older drivers where administrative access to the machines was not available. Where we have full control over the environment the following setup was utilised.

Configuration Item Value
Operating System CentOS Linux 6.4
Linux Kernel Version 2.6.32
NVIDIA Driver Version 319.60
ECC Mode Off
CPU Disabled
Iray Version Iray 2013 SP1 build 194100.10868
CPU Intel Core i7 4770
Memory 8GB DRR3-1600
Chipset Intel Z87
PCIe v3.0 Full 16 Lanes
Image Resolution 1920×1044
Iteration Count 125

Iray Benchmark Scene - Model by Evermotion

In order to ensure we are testing raw Iray performance we have developed a stand-alone benchmark tool based on Iray. Our tool renders a fixed scene multiple times and averages the results to ensure consistency. To ensure the results mean something for real-world use we utilise a non-trivial test scene, ensuring the GPUs have plenty of work to do. The image above is a fully converged version of our test scene.

Helpful Hints

ECC

ECC Settings in the NVIDIA Control Panel Iray does not make use of the ECC functionality available on NVIDIA professional cards such as Tesla or Quadro. In fact enabling ECC on such cards can degrade Iray performance (in our tests by around 2-3%). Where ever possible you should disable ECC functionality on these cards. If you are using a GeForce consumer card ECC is not available so you don’t need to change anything. Under Windows you can disable ECC from the NVIDIA Control Panel (click the image on the right to see what this looks like), you can access this by right clicking on your Desktop and selecting the NVIDIA Control Panel menu item.

There you can easily enable or disable ECC on your cards. Under Linux the easiest way to change the ECC mode is with the nvidia-smi command line tool (it can also be done from the NVIDIA settings GUI but we like command lines here).

nvidia-smi -e 0

This will disable ECC on all GPUs. A reboot is required whenever you make a change however the change will be persistent so you only need to reboot when making changes. Under windows the nvidia-smi tool is also available if you want to use it however it is not usually in the path. You can find it by default in:

C:Program FilesNVIDIA CorporationNVSMI

Another common related question is whether Iray utilises Double Precision Floating Point calculations on the GPU. It does not and as such when looking at published GPU performance numbers you should ignore the specific Double Precision Floating Point performance numbers as these will not have any impact on Iray.

Windows Driver Model

Under Windows there are two so called Driver Models available for cards using the NVIDIA driver. This is only configurable on Tesla and Quadro hardware however if using such hardware it is important to optimise these settings, particularly if you plan to make heavy use of Iray. In our tests using WDDM (Windows Display Driver Model) incurs around a 5% performance penalty (more if the card is being heavily used for other things while Iray is running). Where ever possible you will want to switch the card to TCC mode (Tesla Cluster Compute). WDDM mode is the default for Quadro cards while TCC is the default to Tesla cards. You need to use WDDM for anything you will connect a display to or run OpenGL/DirectX applications on, however CUDA applications will happily find the TCC cards.

To change the driver model you need to use the nvidia-smi command line program. Under Windows (this is only relevant for Windows) this can be found in the directory mentioned in the ECC section above. You need to start a command prompt (go to the start menu and type cmd then press enter) and enter the following commands:

c:
cdProgram FilesNVIDIA CorporationNVSMI
nvidia-smi -g 1 -dm 1

The above will change the display model of the GPU with ID 1 to TCC mode. If you leave of the -g 1 option it will change the display model for all GPUs. You would only want this if you are using on-board graphics and don’t need any OpenGL/DirectX acceleration. If you don’t know the ID of your GPU just run nvidia-smi with no arguments and it will display a table like this:

+------------------------------------------------------+
| NVIDIA-SMI 5.320.78   Driver Version: 320.78         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 5000        WDDM  | 0000:05:00.0      On |                  Off |
| 30%   61C    P0    N/A /  N/A |     2526MB /  2559MB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20c          TCC  | 0000:06:00.0     Off |                  Off |
| 30%   30C    P8    16W / 225W |       13MB /  5119MB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

The ID of the GPU is given in the first column. In this case we have changed the Tesla K20c card to TCC mode with the above command. Like the ECC change this change requires you to reboot your system.

Windows Remote Desktop

Wondering why your favourite CUDA application, like Iray, can’t see your GPUs when you login to your Windows machine with Remote Desktop? This is due to a limitation in Windows itself. When using Remote Desktop, GPUs which are using the WDDM driver model become inaccessible, replaced instead with a non-accelerated Remote Desktop display driver. In order to access your GPU hardware through Remote Desktop you need to place your card in TCC mode (see above), it will then become visible. Alternatively you can utilise a different remote access tool which supports 3D hardware such as Splashtop

If you want the technical details just Google Session 0 Isolation which is the technical term for this limitation which has been imposed by Microsoft on devices using WDDM. Note that this same restriction also applies to Services, so if you want to run your application as a service the same limitations apply.

Get in Touch