Xiaoxu Meng – Page 7 – Welcome to blog of Xiaoxu Meng :) Please visit http://www.mengxiaoxu.com for more information. Thanks!

08/28/2017

low level Optimization of KFR

Optimization of log(|| x – x0, y – y0||)
Optimization of log function
Optimization of fast atan
Make the shader more complex to extend the rendering time to greater than 16ms

I will talk about every step in detail

Optimization of log(|| x – x0, y – y0||)
- There is rendering time reduction
  - Original 52.96ms
  - 1/2 buffer: 15.39ms ->15.57ms
  - 1/4 buffer: 4.20ms -> 4.10ms
Optimization of log function
- The fast-log contains at least 5 branches (possibly 5 additions and 5 shifts for 32 bit calculation)
- The Nvidia log algorithm is not available on line. But the log, exp, sin, cos in AMD GPU is 4x that of add/sub. We can guess Nvidia doesn’t do worse than AMD.
  - Reference1: http://www.iquilezles.org/www/articles/palettes/palettes.htm (Iq talking about sin, cos in GLSL)
    - Popular wisdom (especially between old-school coders) is that trigonometric functions are expensive and that therefore it is important to avoid them (by means of LUTs or linear/triangular approximations). Often popular wisdom is wrong – despite the above still holds true in some especial cases (a CPU heavy inner loop) it does not in general: for example, in the GPU, computing a cosine is way, way faster than any attempt to approximate it. So, lets take advantage of this and go with the straight cosine expression.
  - Analysis of AMD GPU: https://seblagarde.wordpress.com/tag/gpu-performance/
    - Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
  - Discussion about complexity of complexity:
    - 1/x, sin(x), cos(x), log2(x), exp2(x), 1/sqrt(x) – 0 or close to 0, as long as they are limited to 1/9 of all total ops (can go up to 1/5 for Maxwell).
Optimization of fast atan (I only tried diamond angle now. I will try the CORDIC later.)
- Simple comparison of atan2 and diamond angle.
- A test of shadertoy: https://www.shadertoy.com/view/lllyR4
Make the shader more complex to extend the rendering time to greater than 16ms

08/27/2017

Permutation:

http://blog.csdn.net/hackbuteer1/article/details/6657435

Tomorrow:

CORDIC

FastLog
Question: Can glsl do bitwise operation?
- Answer1: Can https://web.cs.ship.edu/~djmoon/cg/cg-notes/cg-glsl-language.pdf
- Answer2: No.

How to build 5d array in C++

If I want to have Arr [const][variable][const][variable][const], what should I do?

Use 5-layer typedef

typedef int A1 [9];

typedef A1 *A2;

typedef A2 A3[8];

typedef A3 *A4;

typedef A4 A5 [7];

int main()

{

A5* x;

return 0;

}

2. Use only one typedef

typedef int(*(*(*B[7])[8])[9])

int main()

{

B y = 0;

}

08/22/2017

Research:

8:30am – 11: 30am

meet with Var
- need to figure out the advantage of our algorithm
Try to update VS15 to get DirectX SDK

3:00 pm – 6:00 pm

Variance sampling TAA
Write paper

Tomorrow:

Read push-pull paper
Read Europe Log polar paper
think about ellipse log-polar

Leetcode:

https://leetcode.com/problems/integer-break/description/

Direct3D Resources

Official Tutorial:

https://code.msdn.microsoft.com/Direct3D-Tutorial-Win32-829979ef?SRC=VSIDE

http://blog.csdn.net/xueyedie1234/article/details/51315640

08/18/2017

Decouple shading rate & visibility rate from pixels: allow for space for anti-aliasing and coarse pixel shading.

Texel Shading (shading rate reduction):

We show performance improvements in three ways. First, we show some improvement for the “small triangle problem”. Second, we reuse shading results from previous frames. Third, we enable dynamic spatial shading rate choices, for further speedups.

Visibility: updating visibility at the full frame rate.

Shading rate: dynamically varying the spatial shading rate by simply biasing the mipmap level choice, texel shading and temporal shading reuse

Some reason for increased shading cost

The first is the mapping from pixels to texels
The second source of shading increase is in the caching system.

Process: deferred decoupled shading

rasterization -> records texel accesses as shading work rather than running a shade per pixel. Shading is performed by a separate compute stage, storing the results in a texture. A final stage collects data from the texture

Object Space Lighting:

Inspired by REYES (render everything your eyes can see)

Overall process

All objects in game are submitted for shading and rasterization. Queued for process
During submission step, the estimated projected area of the object is calculated. Thus an object requests a certain amount of shading
During shading, system allocates texture space for all objects which require shading. If the total request is more then available shading space, all objects are progressively scaled at shading rate until it fits
Material shading occurs, processing each material layer for each object. Results are accumulated into the master shading texture(s)
MIPS are calculated on master shading texture as appropriate
Rasterization step: each object references the shading part step. No specific need that there is a 1:1 correspondence, but this feature is rarely used.

AMFS:

Our architecture is also the first to support pixel shading at multiple different rates, unrestricted by the tessellation or visibility sampling rates.

automatic shading reuse between triangles in tessellated primitives

we decouple pixel shading from screen space
it allows lazy shading and reuse simultaneously at multiple different frequencies

enables a wider use of tessellation and fine geometry, even at very limited power budgets

How do I feel about pursuing Ph.D. degree……

442. Find All Duplicates in an Array

Description:

https://leetcode.com/problems/find-all-duplicates-in-an-array/description/

Code:

class Solution {
public:
    vector<int> findDuplicates(vector<int>& nums) {
        vector<int> result;
        for (int i = 0; i < nums.size();i++)
        {
            if (nums[i]!= i+1)
            {
                while (nums[nums[i] - 1] != nums[i])
                    swap(nums[i], nums[nums[i] - 1]);
            }
        }
        for (int i = 0; i < nums.size();i++)
        {
            if (nums[i]!= i+1)
                result.push_back(nums[i]);
        }
        return result;
    }
};

class Solution {

public:

vector<int> findDuplicates(vector<int>& nums) {

vector<int> result;

for (int i = 0; i < nums.size();i++)

{

if (nums[i]!= i+1)

{

while (nums[nums[i] - 1] != nums[i])

swap(nums[i], nums[nums[i] - 1]);

}

for (int i = 0; i < nums.size();i++)

{

if (nums[i]!= i+1)

result.push_back(nums[i]);

}

return result;

}

};

Time & Space:
O(n) & O(1)

448. Find All Numbers Disappeared in an Array

Description:

https://leetcode.com/problems/find-all-numbers-disappeared-in-an-array/description/

Code:

class Solution {
public:
    vector<int> findDisappearedNumbers(vector<int>& nums) {
        vector<int> result;
        for (int i = 0; i < nums.size();i++)
        {
            if (nums[i]!= i+1)
            {
                while (nums[nums[i] - 1] != nums[i])
                    swap(nums[i], nums[nums[i] - 1]);
            }
        }
        for (int i = 0; i < nums.size();i++)
        {
            if (nums[i]!= i+1)
                result.push_back(i+1);
        }
        return result;
    }
};

class Solution {

public:

vector<int> findDisappearedNumbers(vector<int>& nums) {

vector<int> result;

for (int i = 0; i < nums.size();i++)

{

if (nums[i]!= i+1)

{

while (nums[nums[i] - 1] != nums[i])

swap(nums[i], nums[nums[i] - 1]);

}

for (int i = 0; i < nums.size();i++)

{

if (nums[i]!= i+1)

result.push_back(i+1);

}

return result;

}

};

Code for fastest algorithm:

class Solution {
public:
	vector<int> findDisappearedNumbers(vector<int>& nums) {
		vector<int>result;
		for (int i = 0; i < nums.size(); i++)
		{
			int n = abs(nums[i]) - 1;
			nums[n] = nums[n] > 0 ? -nums[n] : nums[n];
		}
		for (int i = 0; i < nums.size(); i++)
		{
			if (nums[i] > 0) result.push_back(i + 1);
		}
		return result;
	}
};

class Solution {

public:

vector<int> findDisappearedNumbers(vector<int>& nums) {

vector<int>result;

for (int i = 0; i < nums.size(); i++)

{

int n = abs(nums[i]) - 1;

nums[n] = nums[n] > 0 ? -nums[n] : nums[n];

}

for (int i = 0; i < nums.size(); i++)

{

if (nums[i] > 0) result.push_back(i + 1);

}

return result;

}

};

Time & Space:
O(n) & O(1)

495. Teemo Attacking

Description:

https://leetcode.com/problems/teemo-attacking/description/

Code:

class Solution {
public:
    int findPoisonedDuration(vector<int>& timeSeries, int duration) {
        if (timeSeries.size() == 0) return 0;
        int time = duration;
        for (int i = 0; i < timeSeries.size() - 1;i++)
        {
            if (timeSeries[i+1] > timeSeries[i] + duration - 1)
                time += duration;
            else
                time += timeSeries[i+1] - timeSeries[i];
        }
        
        return time;
    }
};

class Solution {

public:

int findPoisonedDuration(vector<int>& timeSeries, int duration) {

if (timeSeries.size() == 0) return 0;

int time = duration;

for (int i = 0; i < timeSeries.size() - 1;i++)

{

if (timeSeries[i+1] > timeSeries[i] + duration - 1)

time += duration;

else

time += timeSeries[i+1] - timeSeries[i];

}

return time;

}

};

Time & Space:
O(n) & O(1)