{"id":2360,"date":"2019-05-28T06:00:17","date_gmt":"2019-05-28T06:00:17","guid":{"rendered":"https:\/\/www.migenius.com\/?p=2360"},"modified":"2019-08-19T00:32:50","modified_gmt":"2019-08-19T00:32:50","slug":"rtx-performance-explained","status":"publish","type":"post","link":"https:\/\/www.migenius.com\/articles\/rtx-performance-explained","title":{"rendered":"RTX Performance Explained"},"content":{"rendered":"\n

NVIDIA RTX technology was announced late last year and has gathered a lot of coverage in the press. Many software vendors have been scrambling to implement support for it since then and there has been a lot of speculation about what is possible with RTX. Now that Iray RTX is finally about to be part of RealityServer we can talk about what RTX means for our customers and where it will be most beneficial for you. <\/p>\n\n\n\n\n\n\n\n

\"RTX\"<\/figure>\n\n\n\n

TL;DR<\/h3>\n\n\n\n

Iray RTX speed-up is highly scene dependent but can be substantial. If your scene has low geometric complexity then you are likely to only see a small improvement. Larger scenes can see multiples of about a 2x speed-up while extremely complex scenes can even see a 3x speed-up.<\/p>\n\n\n\n

What is RTX?<\/h3>\n\n\n\n
\n
\n

RTX is both software and hardware. The key enabling innovation introduced with RTX hardware is a new type of accelerator unit within the GPU called an RT Core. These cores are dedicated purely to performing ray-tracing operations and can do so significantly faster than using traditional general purpose GPU compute. Performance will depend on how many RT Cores your card has. The Quadro RTX 6000<\/a> for example has 72 RT Cores.<\/p>\n<\/div>\n\n\n\n

\n
\"Quadro<\/figure>\n\n\n<\/div>\n<\/div>\n\n\n\n

Along side the new hardware, NVIDIA has introduced various APIs and SDKs which enable software developers to access these new RT Cores. For example, in the gaming world RTX hardware is accessed through the Microsoft Direct X Ray-tracing APIs (DXR). While production rendering tools such as Iray use OptiX<\/a>.<\/p>\n\n\n\n

Rendering software must be modified to take advantage of the new software APIs and SDKs in order to access the hardware. With RTX hardware and the latest RealityServer release, the portion of rendering work performed by Iray that involves ray intersection and computation of acceleration structures (see below) can be offloaded to the new RT Core hardware, greatly speeding up that part of the rendering computation.<\/p>\n\n\n\n

Ray Intersection and Acceleration Structures<\/h3>\n\n\n\n

Ray intersection is the work of determining whether a ray (just think of it as a straight line) crosses through a given primitive (e.g., a triangle). We won’t cover exactly how path-tracers like Iray work but Disney have a great video Practical Guide to Path Tracing<\/a> which gives you a good idea of the basics. You’ll quickly see that ray intersection is key to making this work.<\/p>\n\n\n\n

While the mathematics involved in checking if a ray intersects a primitive is relatively simple (at least for a triangle), scenes today can easily contain millions or even hundreds of millions of primitives. To make matters worse for typical scenes you also need to perform these checks for millions of rays. That’s millions of primitives times millions of rays, a whole lot of computation.<\/p>\n\n\n\n

Naively checking for intersections with all primitives doesn’t cut it, you’d be waiting years for your images. To speed things up, when using ray-tracing, an acceleration structure<\/a> is almost always also used. This uses some pre-computation to split the scene up into a hierarchy of primitives that can be tested rapidly to eliminate large numbers of those primitives from consideration quickly.<\/p>\n\n\n\n

\n
\n

As a very simple example, imagine you have a scene with a million primitives to test distributed fairly evenly. If you cut the scene into two groups, you can first test whether a ray intersects with the volume of one of the groups and if it does not you can immediately exclude half of the primitives. By nesting structures like this you can progressively test until you reach the primitive that is intersected<\/p>\n\n\n\n

While this is a massively over-simplified example and there is a lot of subtlety and nuance to implementing a highly optimised system for this, the basic principle remains the same. Devise a cheap test that can eliminate as many primitives from consideration as possible. RT Core hardware accelerates the query of acceleration structures and the ray intersection calculations making the whole process significantly faster. <\/p>\n<\/div>\n\n\n\n

\n
\"BVH\"<\/figure>\n<\/div>\n<\/div>\n\n\n\n

Enough Already, How Much Faster?<\/h3>\n\n\n\n

It depends. Yes, everyone hates this answer but no way around it here. We’ve so far seen a typical speed-up range, for practical scenes, from 1.05x – 3.00x. That is a pretty wide range, so what determines how much faster it will be? We didn’t describe what ray intersection was above just for fun.<\/p>\n\n\n\n

Notice that when we talked about ray intersection we never talk about materials, textures, lighting, global illumination, shadows or any of the other jargon commonly associated with photorealistic rendering. That is because for a renderer to do its job, it has to do much more than just ray intersections, even if it calls itself a ray-tracer.<\/p>\n\n\n\n

All of the calculations needed for solving light transport, evaluating materials, looking up textures, calculating procedural functions and so on are still being performed on the traditional GPU compute hardware using CUDA (at least in the case of Iray). This portion of the rendering calculation is not being accelerated by RTX. So how much ray intersection is being done in a typical rendering with Iray for example?<\/p>\n\n\n\n

\"RT<\/figure>\n\n\n\n

In many scenes, we found that ray intersection comprises only 20% of the total work being performed by rendering. This is a very important point. Even if the new RT Cores were to make ray intersection infinitely fast so that it takes no time, 80% of the work still remains in that scene. So a 100 second render would still take 80 seconds with RTX acceleration, giving a speed-up of 1.25x. Of course, ray intersection is not free with RTX, just faster, so the speed-up would be lower than this but this is the hypothetical upper limit.<\/p>\n\n\n\n

If you have a scene where 60% of the work is ray intersection you will naturally see a much more significant speed-up. In that case on a 100 second render, with an infinitely fast ray intersector you still have 40 seconds of rendering, giving a speed-up of 2.5x at the hypothetical upper limit. In general we have found RTX provides the greatest benefit in very complex scenes with millions of triangles and also scenes that heavily exploit instancing.<\/p>\n\n\n\n

Real-world Performance Testing<\/h3>\n\n\n\n

We took 14 scenes we had available and tested them on a Quadro RTX 6000 card with Iray 2018.1.3 and Iray RTX 2019.1.0 to evaluate the speed-up.<\/p>\n\n\n\n