# 08/28/2017

low level Optimization of KFR

1. Optimization of log(|| x – x0, y – y0||)
2. Optimization of log function
3. Optimization of fast atan
4. Make the shader more complex to extend the rendering time to greater than 16ms

I will talk about every step in detail

• Optimization of log(|| x – x0, y – y0||)
• There is rendering time reduction
• Original 52.96ms
• 1/2 buffer: 15.39ms ->15.57ms
• 1/4 buffer: 4.20ms -> 4.10ms
• Optimization of log function
• The fast-log contains at least 5 branches (possibly 5 additions and 5 shifts for 32 bit calculation)
• The Nvidia log algorithm is not available on line. But the log, exp, sin, cos in AMD GPU is 4x that of add/sub. We can guess Nvidia doesn’t do worse than AMD.
• Reference1: http://www.iquilezles.org/www/articles/palettes/palettes.htm (Iq talking about sin, cos in GLSL)
• Popular wisdom (especially between old-school coders) is that trigonometric functions are expensive and that therefore it is important to avoid them (by means of LUTs or linear/triangular approximations). Often popular wisdom is wrong – despite the above still holds true in some especial cases (a CPU heavy inner loop) it does not in general: for example, in the GPU, computing a cosine is way, way faster than any attempt to approximate it. So, lets take advantage of this and go with the straight cosine expression.
• Analysis of AMD GPU: https://seblagarde.wordpress.com/tag/gpu-performance/
• Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
• Discussion about complexity of complexity:
• 1/x, sin(x), cos(x), log2(x), exp2(x), 1/sqrt(x) – 0 or close to 0, as long as they are limited to 1/9 of all total ops (can go up to 1/5 for Maxwell).
• Optimization of fast atan (I only tried diamond angle now. I will try the CORDIC later.)
• Simple comparison of atan2 and diamond angle.