Friday, October 31, 2008

GI - Photon Mapping - Post 8 - Optimization Summary / Rant

In the last 3 days all the work has been on optimization, below is a summary of changes and the speedups they produces (speedup = old render time / new render time)
  • Changing intersector from Trumbore/Moller to Optimized Wald Projection -> 0.25
  • 2x2 Packet Tracing (SSE/SSE2 using SIMD intrinsics) -> 3.0-4.0
  • Including ordering information into traversal (do not traverse BVH box if intersection was found in previous boxes and t intersection <> 1.8, 2x2-> 1.3
  • SIMDifying primary ray packet generation (instead of looping to generate the 4 rays) -> 1.05
  • Changing BVH traversal from recursive to iterative: Mono->1.04, 2x2->soon
  • Changing BVH memory allocation scheme to improve traversal cache friendliness: soon
  • Chancing BVH build algorithm (to SAH?): soon
Some if not all speedups are of course scene dependent, some where tested with Stanford Bunny while others with Cornell Box. So the numbers here are very rough and relative, additionally some changes might produce slowdowns in some scenes/configurations, but this is just a simple summary. After I have all of the high level optimizations done, I will probably try toxie's test benchmark and then have a go at lower level optimizations using a profiler, but for that I am waiting for my new hardware where I am planning to have Linux installed, I will then be able to comfortably use valgrind, gprof or VTune.

I am also planning to have a look at gcc's profile guided optimization. Acovea seems to be a fun thing to try as well.

Talking about low level optimization, I recently read Randall Hyde's Write Great Code-Vol I, and reading Vol II, which are nice as refreshers on assembly and low level optimization as well as nice summaries of what modern compilers do and more importantly do not do although one might expect it.
The publicly available Software Optimization Guide for AMD64 Processors is also similar in a way, in that it provides a list of do's and dont's for the mentioned processors, but many things apply in general as well, the nice thing is that there is an example for each case.

Another fun thing is superoptimizers, and the fact that a perfect super optimization is actually a NP-Hard problem and that is of course not a surprise, the interesting thing about it is that actually most if not all fun games are also NP-Hard, since as soon as a game is found not to be NP-Hard it is considered 'solved', and it not really fun to play when one knows the solution,
this might explain why so many of us are attracted to optimization because it is actually NP-Hard (fun!). The same goes for AI and other game related technologies.

Basically all we are trying to do is to surf on the edge of the current technology and try to push the most out of it by using algorithms that were not possible before but are almost possible now if optimized.

No comments: