• Due Wed: Meet Varshney
    • GrayScale Rendering
      • I searched: single channel textures, grayscale framebuffer.
      • https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glTexImage2D.xhtml
      • Solution: Dropbox\OPENGL\TrySomething\Sponza\grayScaleRendering
      • Compare GL_RGB & GL_RED
      •                          Memory             Time
      • GL_RGB        386MB               24ms
      • GL_RED        350MB               24ms
      • Cannot do better if reduce color
      • One question: How to render to frame in one channel?
    • Strange meshes
  • Due Wed: Watch video of CMSC726 (Should do this evening)
  • Due Fri: HW2 CMSC726 (should finish it on Wed)
  • Dur Sep.21: CMSC740 HW3, should finish during weekend
  • Should summarize CMSC740 contents this weekend


  • Bug for log polar transformation for meshes:
    • 1. Is it caused by input quads?
      • I guess no. As https://www.opengl.org/discussion_boards/showthread.php/175665-Geometry-shader-to-handle-quads said, those types (quads) are all tessellated by the driver to triangles before being passed to the geometry shader. Each quad would be split into two triangles and these would be processed separately by the geometry shader.



  • Fixed depth error. Should initialize depth buffer after obj load
  • Fixed drawing triangles for the cut line:
    • I wrongly set the “layout (triangle_strip, max_vertices = 3) out”, if max_vertices = 3, only one triangle will be outputted!
  • Should use vertex, tess, geom, frag shaders together now! Let recall the whole pipeline…https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview
  • (09/08 added) I met an error for tessellation:
  • Use tessellation CTRL shader to determine the level of tessellation:
    • how to determine the level dynamically? Can do this
    • Ensure that the shared edge(s) between the patches use the same level of tessellation
  • Use tessellation eval shader to calculate the area of the triangle.
  • ——————————————————————————————————————————
  • How to dynamically solve the problem of tessellation?
  • Central Triangle??





Could load texture correctly when rendering to framebuffer. However, when framebuffer is used, only left down corner of the texture is displayed.


I loaded a model with some texture initialization before initiating framebuffer. And the model affected the initialization of framebuffer;

Should initialize framebuffer before loading the model.

Task for tomorrow:

Try every simple mesh in the object separately.


  1. Find large model
    1. Can load Sponza
    2. The loading library is not complete. The .mtl cannot be fully loaded. But the rendering time can achieve >28ms
  2. Color:
    1. Patney: people can easily find difference if color is changed.
  3. Use short in to replace float


Plan for tomorrow:

  1. try to download dependency of https://github.com/NCCA/Sponza
  2. if not work, try to load the textures
  3. Change from single float to short in for peripheral pixels
  4. finish homework for 726 in python


low level Optimization of KFR

  1. Optimization of log(|| x – x0, y – y0||)
  2. Optimization of log function
  3. Optimization of fast atan
  4. Make the shader more complex to extend the rendering time to greater than 16ms

I will talk about every step in detail

  • Optimization of log(|| x – x0, y – y0||)
    • There is rendering time reduction
      • Original 52.96ms
      • 1/2 buffer: 15.39ms ->15.57ms
      • 1/4 buffer: 4.20ms -> 4.10ms
  • Optimization of log function
    • The fast-log contains at least 5 branches (possibly 5 additions and 5 shifts for 32 bit calculation)
    • The Nvidia log algorithm is not available on line. But the log, exp, sin, cos in AMD GPU is 4x that of add/sub. We can guess Nvidia doesn’t do worse than AMD.
      • Reference1: http://www.iquilezles.org/www/articles/palettes/palettes.htm (Iq talking about sin, cos in GLSL)
        • Popular wisdom (especially between old-school coders) is that trigonometric functions are expensive and that therefore it is important to avoid them (by means of LUTs or linear/triangular approximations). Often popular wisdom is wrong – despite the above still holds true in some especial cases (a CPU heavy inner loop) it does not in general: for example, in the GPU, computing a cosine is way, way faster than any attempt to approximate it. So, lets take advantage of this and go with the straight cosine expression.
      • Analysis of AMD GPU: https://seblagarde.wordpress.com/tag/gpu-performance/
        • Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
      • Discussion about complexity of complexity:
        • 1/x, sin(x), cos(x), log2(x), exp2(x), 1/sqrt(x) – 0 or close to 0, as long as they are limited to 1/9 of all total ops (can go up to 1/5 for Maxwell).
  • Optimization of fast atan (I only tried diamond angle now. I will try the CORDIC later.)
    • Simple comparison of atan2 and diamond angle.
    • A test of shadertoy: https://www.shadertoy.com/view/lllyR4
  • Make the shader more complex to extend the rendering time to greater than 16ms