Iray Performance Benchmarks and Hardware Setup

Benchmark Results

These are updated benchmark results for Iray 2016.3. We have started retesting our previous benchmarks with this updated version of Iray as there are significant improvements on newer architectures (Pascal in particular) over the previous generation architectures. We don’t have all of the same cards re-tested as yet but are working to test as many as possible. We are no longer testing both Iray Photoreal and Iray Interactive modes since we have found the relative performance across different cards to be very consistent. As such Iray Photoreal results give a very good indication of the expected performance deltas with Iray Interactive as well.

This update now includes results from updated cloud offerings as well, including Nimbix, Amazon EC2Microsoft Azure and Google Compute Engine. If you know of any other on-demand GPU cloud providers and would like to see them benchmarked please contact us. Since cloud providers publish pricing we have also been able to give some indication of the cost effectiveness of each offering.

Bare Metal Performance

Iray Photoreal

These results are for Iray Photoreal. We use the batch scheduling mode to ensure maximal use of the GPU resources. For more details see the testing methodology below the results.





Cloud Provider Performance

An increasing number of cloud providers are offering on-demand GPU resources. This is great news for users of Iray enabled applications and to give you an idea of what performance to expect relative to bare metal hardware you might find in your own computers we have run all of the tests for you. For cloud providers you often get reasonably powerful CPU configurations with your resources so we have tested both with and without CPU enabled for each type.

Iray Photoreal




Price / Performance

Since cloud providers publish pricing data we have been able to also create a chart of the price / performance ratio for each of the offerings. The results below show how much performance you get per dollar spent for each of the configurations (higher is better). Keep in mind that if your application requires a specific response time or performance level then you should not look at this data alone, since it does not indicate how fast a configuration is, only how good the value for money is in computing a given amount of work. You also need to consider how much GPU memory you need for your application.





For AWS note that we omitted the very old cg1.2xlarge instance type since this is no longer fully supported by AWS and we were unable to even get our test to run. For each provider, when analysing price, we used the price from their primarily advertised region where multiple regions are available. Most providers with multiple regions vary their pricing in each region to account for local costs. All pricing was in USD.

For Google Compute Engine, rather than a pre-configured machine type consisting of a CPU and GPU, you attach GPU types to any of the supported machine types. This means you can actually run larger numbers of GPUs on machines with much less CPU resource than normal. For the tests above on Google Compute Engine we have excluded CPU results for this reason. For the price/performance chart we use machine types that we felt were appropriate for the number of GPUs selected. You could of course try with less CPU resources, your millage may vary. Also note that at time of writing GPUs on Google Compute Engine were still in beta.

DGX-1 and Quadro VCA

The DGX-1 and Quadro VCA are an appliance offering which has been specifically tuned to run demanding GPU accelerated applications. In addition to very high performance, the Quadro VCA specifically offers Iray IQ mode allowing large numbers of Quadro VCA appliances to be interconnected for extreme performance. We didn’t have access to large quantities of Quadro VCA appliances for our tests, however below are the results for a single Quadro VCA to give you a feeling for the performance as well as a single DGX-1. We put this in its own section because including it in the main bare metal graph scales the single cards too small!

Iray Photoreal




Testing Methodology

All benchmarks have been performed under Linux (usually CentOS 7.2) with the latest available NVIDIA drivers (as of writing 375.20). Some tests were performed on older drivers where administrative access to the machines was not available. Where we have full control over the environment the following setup was utilised.

Configuration Item Value
Operating System CentOS Linux 7.2.1511
Linux Kernel Version 3.10.0-327.36.3
NVIDIA Driver Version 375.20
ECC Mode Off
CPU Disabled
Iray Version Iray 2016.3 build 278300.4305
CPU Intel Core i5 6500
Memory 16GB DRR4-2133
Chipset Intel H170
PCIe v3.0 Full 16 Lanes
Image Resolution 1920×1044
Iteration Count 125

Iray Benchmark Scene - Model by Evermotion

In order to ensure we are testing raw Iray performance we have developed a stand-alone benchmark tool based on Iray. Our tool renders a fixed scene multiple times and averages the results to ensure consistency. To ensure the results mean something for real-world use we utilise a non-trivial test scene, ensuring the GPUs have plenty of work to do. The image above is a fully converged version of our test scene.

Note that these benchmarks are not performed in a way that they can be compared to the previous series of benchmarks migenius conducted which is why we are retesting even the older cards where possible. This is due to changes in both the scene data (to move to MDL) and Iray itself. New Iray versions often change the relationship between iteration count and quality which can affect our absolute measurements. However all relative measurements between cards within the benchmark are valid.

Get in Touch