KuaFu: A Ray-Tracing Based Renderer for from 100 to 1000 GB Scenes


KuaFu Renderer
Main Features:
Out-of-Core Ray-Tracing without Ray-Sorting
Universal Photon-Mapping and Universal Filtering
Novel Quasi-Monte Carlo Methods for Rendering


Out-of-Core Ray-Tracing without Ray-Sorting

One of the approaches to out-of-core ray-tracing is to improve the coherence of the rays to reduce I/O overhead (e.g. Pharr M., Kolb C., Gershbein R., Hanrahan P., Rendering Complex Scenes with Memory-Coherent Ray Tracing, 1997). More recently, this idea was extended to sort out-of-core ray batches to exploit coherence (Eisenacher C., Nichols G., Selle A., Burley B., Sorted Deferred Shading for Production Path Tracing, 2013) and it had been implemented in Disney Hyperion renderer.

However, there are two questions have not been answered:
How much coherence can be exploited?
Is ray-sorting always necessary?

By attempts to answer these questions, it comes to a different way to out-of-core ray-tracing. Below is a test scene which consists of 104 GB data, the rendering takes 5 hours 5 minutes to finish on a notebook with 14 GB available memory (4096*1744, 64 SPP, path-length 4, no instancing). Although plants are irregular, they can cause complex distribution of the rays, KuaFu still works well for such challenging scenes thanks to its special algorithms.



Universal Photon-Mapping and Universal Filtering

Universal photon-mapping (UPM) is a biased while consistent method. It is as simple and fast as path-tracing while its rendering quality is much higher than it. Due to its simplicity, UPM is also suitable for interactive rendering and distributed rendering. Moreover, besides surface scattering, UPM can also handle sub-surface scattering and volume scattering.

Universal filtering (UF) is a complement to UPM, it can filter the global-illumination without mixing the texture details, geometry details, and shadow details.


Novel Quasi-Monte Carlo Methods for Rendering

Quasi-Monte Carlo (QMC) methods can achieve faster convergence rates than traditional Monte Carlo methods for numerical integration. Although there are massive QMC related researches in mathematics, the applications of QMC methods to rendering may encounter some special issues:

Trajectory Splitting: In order to combine QMC methods and trajectory splitting, a method which utilizes Halton sequences and Cranley-Patterson rotation was proposed (Keller A., Strictly Deterministic Sampling Methods in Computer Graphics, 2001). But how to achieve the minimal randomization for these low-discrepancy sequences is not known (Keller A., Trajectory Splitting by Restricted Replication, 2004).

Correlation Patterns: QMC methods may reveal correlation patterns in rendered images when the count of samples is not large enough. There is a tiled scrambling method to break these correlation patterns (Raab M., Wachter A., Keller A., System, Method, and Computer Program Product for Tiled Screen Space Sample Scrambling for Parallel Deterministic Consistent Light Transport Simulation, 2013), but to some extent the scrambling also breaks the uniform distribution of the samples constructed by QMC methods. Another potential solution to this issue is Latin supercube sampling (LSS) which only permutes the high dimensional part of the samples (Owen A., Latin Supercube Sampling for Very High-Dimensional Simulations, 1997), but LSS is not suitable for progressive rendering and its memory consumption is large.

Reusable Trajectory Splitting: Progressive photon mapping (PPM) can be regarded as a special trajectory splitting which requires the splitting to be reusable. Such requirement obstructs porting existing QMC methods to PPM. To address this issue, a method which utilizes properties of upper triangular generator matrices was proposed (Keller A., Quasi-Monte Carlo Image Synthesis in a Nutshell, 2012). Though it is an elegant method, it also imposes some restrictions on computing: first, the generator matrices must be upper triangular so it rules out many other (t,s)-sequences; second, we have to partition the (t,s)-sequences along the dimensions; last, the count of camera rays must equate the count of light rays, it is not flexible.

KuaFu has its own QMC methods which can overcome the above issues.


Other Features:


KuaFu can make full use of AVX and AVX2 instruction sets without any third-party library (Embree etc.). Its ray-tracing core and shading system are easy for maintaining and porting.

Precise Ray-Tracing

Due to its finite precision, floating-point arithmetic may cause numerical issues in ray-tracing like cracks and self-intersections. There is a recursive subdivision method which can avoid these issues at the cost of a significant decrease in performance (Dammertz H., Keller A., Improving Ray Tracing Precision by Object Space Intersection Computation, 2006). KuaFu provides a different solution, its quality is as good as the subdivision method while its performance impact is negligible.


During rendering, KuaFu will run as a separate process, there is a plug-in Max2KF which is responsible for translating scene data from 3dsmax to KuaFu and receiving rendering results from it through pipes. In case of encountering an error or running out of memory, KuaFu will terminate itself without crashing 3dsmax, in other words, the scene data is safe in this way.

Per-Object based Texture File Assigning

It is not necessary to create a new material for each object if only their texture files are different. KuaFu provides per-object based texture file assigning to eliminate these redundant tasks.


Wow. Ok, you got my attention. This looks legitimately interesting. Is it all CPU? I like CPU renders :slight_smile:

So you use avx instructions without resorting to Embree libs? You wrote your own? How does it fare on Intel versus AMD cpus? Thinking Threadripper and i9


Hi, davius

  1. It is cpu based.
  2. No preference to intel nor amd, both of their cpus should work well, but i don’t have a amd cpu for testing.


Looks really cool. Any chance of a Maya version? I know it’s a lot to ask…


it is definitely possible, but no certain time, maybe in the further:)


Crashing here with windows 10 / 2017. Failed to launch KuaFu Renderer…
Anybody got it going ?


Hi, hotknife
does your cpu support AVX?
if not, please wait a few days, i plan to support these SSE4 cpus soon.


Looks great. Any chance of other 3D app support? Like Blender, C4D and so on?

And how is the speed compared to path tracers, since you are using a different method?


it is possible, but no plan yet :slight_smile:

the ray-tracing performance is similar to Embree on CPUs supporting AVX2, while the ray-tracing precision is higher than it (KuaFu still use single-precision, but in a special way).


KuaFu! Is that named after the giant who chases the sun?

It’s the name of a server at my office. We use names out of Chinese mythology (Beijing based company).


Yes, it is :slight_smile:


I’d love to see a Blender port of this renderer.

Bforartists is great, but it would be greater with a render engine that isn’t Cycles.


How long before 2018 Max version?
Thanks and this looks really cool.


i haven’t begun to prepare for 2018, but it will soon :slight_smile: