These are updated benchmark results for Iray 2016.3. We have started retesting our previous benchmarks with this updated version of Iray as there are significant improvements on newer architectures (Pascal in particular) over the previous generation architectures. We don’t have all of the same cards re-tested as yet but are working to test as many as possible. We are no longer testing both Iray Photoreal and Iray Interactive modes since we have found the relative performance across different cards to be very consistent. As such Iray Photoreal results give a very good indication of the expected performance deltas with Iray Interactive as well.
This update now includes results from updated cloud offerings as well, including Nimbix, Amazon EC2 and Microsoft Azure. As soon as Google release their new GPU offering we will also be benchmarking there. If you know of any other on-demand GPU cloud providers and would like to see them benchmarked please contact us. Since cloud providers publish pricing we have also been able to give some indication of the cost effectiveness of each offering.
These results are for Iray Photoreal. We use the batch scheduling mode to ensure maximal use of the GPU resources. For more details see the testing methodology below the results.
An increasing number of cloud providers are offering on-demand GPU resources. This is great news for users of Iray enabled applications and to give you an idea of what performance to expect relative to bare metal hardware you might find in your own computers we have run all of the tests for you. For cloud providers you often get reasonably powerful CPU configurations with your resources so we have tested both with and without CPU enabled for each type.
Since cloud providers publish pricing data we have been able to also create a charge of the price / performance ratio for each of the offerings. The results below show how much performance you get per dollar spent for each of the configurations (higher is better). Keep in mind that if your application requires a specific response time or performance level then you should not look at this data alone, since it does not indicate how fast a configuration is, only how good the value for money is in computing a given amount of work. You also need to consider how much GPU memory you need for your application.
For AWS note that we omitted the very old cg1.2xlarge instance type since this is no longer fully supported by AWS and we were unable to even get our test to run. For each provider, when analysing price, we used the price from their primarily advertised region where multiple regions are available. Most providers with multiple regions vary their pricing in each region to account for local costs. All pricing was in USD.
The Quadro VCA is an appliance offering which has been specifically tuned to run Iray. In addition to very high performance it also offers Iray IQ mode allowing large numbers of Quadro VCA appliances to be interconnected for extreme performance. We didn’t have access to large quantities of Quadro VCA appliances for our tests, however below are the results for a single Quadro VCA to give you a feeling for the performance. We put this in its own section because including it in the main bare metal graph scales the single cards too small!
All benchmarks have been performed under Linux (usually CentOS 7.2) with the latest available NVIDIA drivers (as of writing 375.20). Some tests were performed on older drivers where administrative access to the machines was not available. Where we have full control over the environment the following setup was utilised.
|Operating System||CentOS Linux 7.2.1511|
|Linux Kernel Version||3.10.0-327.36.3|
|NVIDIA Driver Version||375.20|
|Iray Version||Iray 2016.3 build 278300.4305|
|CPU||Intel Core i5 6500|
|PCIe||v3.0 Full 16 Lanes|
In order to ensure we are testing raw Iray performance we have developed a stand-alone benchmark tool based on Iray. Our tool renders a fixed scene multiple times and averages the results to ensure consistency. To ensure the results mean something for real-world use we utilise a non-trivial test scene, ensuring the GPUs have plenty of work to do. The image above is a fully converged version of our test scene.