|Team Members:||Oluwapelumi Adenikinju1,
and Thomas Stitt3 |
|Graduate Research Assistant:||Jonathan Graf4
and Samuel Khuvis4 |
|Faculty Mentor:||Matthias K. Gobbert4|
and Marc Olano1 |
The original code in rrv solves for the solution of the radiosity equation
using a fixed number of iterations in the Jacobi method.
This can be physically thought of as one light bounce per iteration,
as illustrated visually in the flow chart in Figure 1.
BiCG-STAB Method: The Jacobi method can be slow to converge. The BiCG-STAB method is an alternative iterative method, whose convergence speed increases with iterations. It requires twice as much work per iteration, hence might take less time for complex problems that require a large number of iterations with the Jacobi method.
CUDA: CUDA is NVIDIA's library for leveraging the massively parallel architecture of GPUs (graphics processing units). GPUs are designed for SIMD (single instruction multiple data) applications like those used in solving the radiosity equation. CUDA has an efficient package (cuBLAS) for solving linear system, but since the system is composed of vector elements with multiple values (arrays of structures), these methods cannot be used. We implement a dot-product and matrix-vector product for vectors of this type using binary tree reduction and a simple axpby operation for vector scaling and addition.
OpenMP: OpenMP (Open Multi-Processing) is an application programming interface that provides simplified and cross-platform shared-memory parallelization. Memory is considered "shared" when each worker (thread) shares it with all other workers, unless marked worker-private - which is beneficial here since the system matrix of the radiosity equation is large. With OpenMP, portions of serial code are flagged to be run in parallel by work distribution among available threads. We utilize parallelization much like with CUDA to speed up the matrix-vector, dot, and axpby functions.
Runtime results were obtained for four sample scenes of varied patch count and complexity. We see in Table 1 that the scenes used were not complex enough to make apparent the advantages of the reduced number of iterations of BiCG-STAB method, due to its increased cost per iteration.
|Scene ID||Patch count||iter||runtime||iter||runtime|
Parallelization on the other hand showed marked improvement with observed run times in units of seconds in Table 2 dropping over an order of magnitude.
Both computational methods and parallelization were explored with the aim of near-real to real time global illumination solutions to the radiosity algorithm. Given the scenes we tested, OpenMP and CUDA both show substantial runtime improvements while the change from the Jacobi method to the BiCG-STAB method actually resulted in increased runtime due to the method's complexity. It appears that global illumination problems are not in general best suited for mathematical reformulations, but taking advantage of work distribution seems favorable, no matter the scene complexity.