*.mtl content explanation

newmtl mtlName # mtlName is the name of the material

Ka 1.000 1.000 1.000 #材质的阴影色(ambient color)用Ka声明。颜色用RGB定义,每条通道的值从0到1之间取。

Kd 1.000 1.000 1.000 #固有色(diffuse color)

Ks 0.000 0.000 0.000 #高光色(specular color), if black {specular color is closed}

Ns 10.000 #Use Ns to represent weighted specular color, range 0 – 1000

illum 2 #illumination mode

0. 色彩开,阴影色关
1. 色彩开,阴影色开
2. 高光开
3. 反射开,光线追踪开
4. 透明: 玻璃开 反射:光线追踪开
5. 反射:菲涅尔衍射开,光线追踪开
6. 透明:折射开 反射:菲涅尔衍射关,光线追踪开
7. 透明:折射开 反射:菲涅尔衍射开,光线追踪开
8. 反射开,光线追踪关
9. 透明: 玻璃开 反射:光线追踪关
10. 投射阴影于不可见表面

d 0.9 # dissolve? 有些用’d’实现
Tr 0.9 # 其他的用’Tr’

map_Ka lena.tga # 阴影色纹理贴图
map_Kd lena.tga # 固有色纹理贴图 (多数情况下与其阴影色纹理贴图相同)
map_Ks lena.tga # 高光色纹理贴图
map_d lena_alpha.tga # alpha通道纹理贴图
map_bump lena_bump.tga # 凹凸贴图
bump lenna_bump.tga # 也有用’bump’而非’map_Bump’标签

For textures:

.dif are for diffuse

.alpha are for transparentcy

.spec are for specular reflection

.ddn are tengent space normal maps



Plan for tomorrow:

  1. try to download dependency of https://github.com/NCCA/Sponza
  2. if not work, try to load the textures
  3. Change from single float to short in for peripheral pixels
  4. finish homework for 726 in python


low level Optimization of KFR

  1. Optimization of log(|| x – x0, y – y0||)
  2. Optimization of log function
  3. Optimization of fast atan
  4. Make the shader more complex to extend the rendering time to greater than 16ms

I will talk about every step in detail

  • Optimization of log(|| x – x0, y – y0||)
    • There is rendering time reduction
      • Original 52.96ms
      • 1/2 buffer: 15.39ms ->15.57ms
      • 1/4 buffer: 4.20ms -> 4.10ms
  • Optimization of log function
    • The fast-log contains at least 5 branches (possibly 5 additions and 5 shifts for 32 bit calculation)
    • The Nvidia log algorithm is not available on line. But the log, exp, sin, cos in AMD GPU is 4x that of add/sub. We can guess Nvidia doesn’t do worse than AMD.
      • Reference1: http://www.iquilezles.org/www/articles/palettes/palettes.htm (Iq talking about sin, cos in GLSL)
        • Popular wisdom (especially between old-school coders) is that trigonometric functions are expensive and that therefore it is important to avoid them (by means of LUTs or linear/triangular approximations). Often popular wisdom is wrong – despite the above still holds true in some especial cases (a CPU heavy inner loop) it does not in general: for example, in the GPU, computing a cosine is way, way faster than any attempt to approximate it. So, lets take advantage of this and go with the straight cosine expression.
      • Analysis of AMD GPU: https://seblagarde.wordpress.com/tag/gpu-performance/
        • Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
      • Discussion about complexity of complexity:
        • 1/x, sin(x), cos(x), log2(x), exp2(x), 1/sqrt(x) – 0 or close to 0, as long as they are limited to 1/9 of all total ops (can go up to 1/5 for Maxwell).
  • Optimization of fast atan (I only tried diamond angle now. I will try the CORDIC later.)
    • Simple comparison of atan2 and diamond angle.
    • A test of shadertoy: https://www.shadertoy.com/view/lllyR4
  • Make the shader more complex to extend the rendering time to greater than 16ms








8:30am – 11: 30am

  • meet with Var
    • need to figure out the advantage of our algorithm
  • Try to update VS15 to get DirectX SDK

3:00 pm – 6:00 pm

  • Variance sampling TAA
  • Write paper


  • Read push-pull paper
  • Read Europe Log polar paper
  • think about ellipse log-polar


  • https://leetcode.com/problems/integer-break/description/


Decouple shading rate & visibility rate from pixels: allow for space for anti-aliasing and coarse pixel shading.

Texel Shading (shading rate reduction):

We show performance improvements in three ways. First, we show some improvement for the “small triangle problem”. Second, we reuse shading results from previous frames. Third, we enable dynamic spatial shading rate choices, for further speedups.

Visibility:  updating visibility at the full frame rate.

Shading rate: dynamically varying the spatial shading rate by simply biasing the mipmap level choice, texel shading and temporal shading reuse

Some reason for increased shading cost

  • The first is the mapping from pixels to texels
  • The second source of shading increase is in the caching system.

Process: deferred decoupled shading

rasterization -> records texel accesses as shading work rather than running a shade per pixel. Shading is performed by a separate compute stage, storing the results in a texture. A final stage collects data from the texture


Object Space Lighting:

Inspired by REYES (render everything your eyes can see)

Overall process

All objects in game are submitted for shading and rasterization. Queued for process
During submission step, the estimated projected area of the object is calculated. Thus an object requests a certain amount of shading
During shading, system allocates texture space for all objects which require shading. If the total request is more then available shading space, all objects are progressively scaled at shading rate until it fits
Material shading occurs, processing each material layer for each object. Results are accumulated into the master shading texture(s)
MIPS are calculated on master shading texture as appropriate
Rasterization step: each object references the shading part step. No specific need that there is a 1:1 correspondence, but this feature is rarely used.



Our architecture is also the first to support pixel shading at multiple different rates, unrestricted by the tessellation or visibility sampling rates.

automatic shading reuse between triangles in tessellated primitives

  1. we decouple pixel shading from screen space
  2. it allows lazy shading and reuse simultaneously at multiple different frequencies

enables a wider use of tessellation and fine geometry, even at very limited power budgets