Paper Reading: View Direction and Bandwidth Adaptive 360 Degree Video Streaming using a Two-Tier System

Each segment is coded as a base-tier (BT) chunk, and multiple enhancement-tier (ET) chunks.

BT chunks:

represent the entire 360 view at a low bit rate and are pre-fetched in a long display buffer to smooth the network jitters effectively and guarantee that any desired FOV can be rendered with minimum stalls.

ET chunks:

Facebook 360 video:

PointNet, PointNet++, and PU-Net

point cloud -> deep network -> classification / segmentation / super-resolution

traditional classification / segmentation:

projection onto 2D plane and use 2D classification / segmentation

unordered set

point(Vec3) -> feature vector (Vec5) -> normalize (end with the bound of the pointcloud)

N points:


feature from N points ->NxK classes of each point (each point will have a class)


feature from N points -> K x 1 vector (K classes)


Lecture 10: Neural Network

  1. Deep learning
  2. Representation learning
  3. Rule-based
    1. high explainability
  4. Linguistic supervision
  5. Semi-supervision
    1. have small set of data with label
    2. has large set of data without label
  6. Recurrent-level supervision
  7. Language structure

description lengths DL= size(lexicon) + size( encoding)

  1. lex1
    1. do
    2. the kitty
    3. you
    4. like
    5. see
  2. Lex2
    1. do
    2. you
    3. like
    4. see
    5. the
    6. kitty
  3. How to evaluate the two lexicons?
    1. lex 1 have 5 words, lex 2 has 6 words
    2. Potential sequence
      1. lex1: 1 3 5 2, 5 2, 1 3 4 2
      2. lex2: 1 3 5 2 6, 5 2 6, 1 3 4 2 6
  1. MDL: minimum description lengths
    1. unsupervised
    2. prosodic bootstrapping

Boltzmenn machine

Lexical space

relatedness vs. similarity

  • use near neighbors: similarity
  • use far neighbors: relatedness

ws-353 has similarity & relatedness

loss function:



Part1: potential methods

  • LDA
  • readability
  • syntactic analysis



Questions about “Foveated 3D Graphics (Microsoft)” User Study

  1. Problem1: They did test for only one scene.
    1. The first problem is that foveation level is highly depentent on scene. They may get totally different parameters if they change to another scene. Of course, this is the problem of all the user studies. Till now, only NVIDIA mentiond the multliple factors affecting vision. However, they don’t have good ways to deal with this.
    2. The second problem is about data analysis. They avoid the problem of one parameter ->multiple result by testing only one scene.
  2. Problem2: I don’t believe that their result is monotone.
    1.  They just said:
      1. Ramp Test: For the ramp test, we identified this threshold as the lowest quality index for which each subject incorrectly labeled the ramp direction or reported that quality did not change over the ramp.
      2. Pair Test: for the pair test, we identified a foveation quality threshold for each subject as the lowest variable index j
        he or she reported as equal to or better in quality than the non-foveated reference.
    2. Suppose their quality level is 11,12,13,14,15. What if they get result of 1,1,1,0,1 ? Is their final quality level 13 or 15?
      1. I don’t believe this situation did happen in their user study.
      2. If it happens, what should we do? Of course we should test for multiple scenes for many participants, and get the average. So we go back to problem 1.

The key of using Unity Texture array

  1. The texture array structure in different from GLSL and Unity + HLSL, the sequence of image index i and j must be adjusted. Suppose the light field has structure 16 x 16:
      1. In GLSL Shader:

      1. In Unity Shader:

  2. The program of using texture array in unity
    1. Suppose we want to load the images in folder “lytro”, we must mkdir called “Resources” in the folder “Asset”, then drag the folder “lytro” into “Resources”.

Word2Vec Models

Collection of all pre-trained Word2Vec Models:

Google’s model seems not reliable…

Here are some similarity tests of Google’s model:

The similarity between good and great is: 0.7291509541564205
The similarity between good and awesome is: 0.5240075080190216
The similarity between good and best is: 0.5467195232933185
The similarity between good and better is: 0.6120728804252082
The similarity between great and awesome is: 0.6510506701964475
The similarity between great and best is: 0.5216033921316416
The similarity between great and better is: 0.43074460922502006
The similarity between awesome and best is: 0.3584938663818339
The similarity between awesome and better is: 0.27186951236001483
The similarity between best and better is: 0.5226434484898708
The similarity between food and foodie is: 0.3837408842876883
The similarity between food and eat is: 0.5037572298482941
The similarity between foodie and eat is: 0.3050075692941569
The similarity between park and view is: 0.1288395798972001
The similarity between design and art is: 0.3347430713890944

Lecture 8: Evaluation

  • Information about midterm
  • PCFG
    • Start with S
    • ∑Pr(A -> gamma | A) = 1
      • (conditional) probability of each item has to sum to one
    • Pr(O = o1,o2,…,on|µ)
      • HMM: Forward
      • PCFG: Inside-Outside
    • Guess Pr: argmax_(Z)[ Pr(Z|O, µ) ]
      • HMM:Use Viterbi to get
      • PCFG: Use Viterbi CKY to get
      • *Z is the best sequence of states
    • Guess µ: argmax_(µ)[Pr(O|µ)]
      • HMM:Use forward-backward to get
      • PCFG: Use Inside-outside to get
    • Example:
      • Sentence:
        • ——————-S
        • ——–NP—————-VP
        • ——–NP———-V————-NP
        • ——people——eats —–adj——–N
        • —————————roasted—-peanuts
      • Problem:
        • Pr_µ(peanuts eat roasted people) = Pr_µ(people eat roasted peanut)
      • We can try to generate head of each phrase:
        • ————————————S (Head: eat)
        • ——–NP(Head: people)—————————–VP(Head: eat)
        • ——–NP(Head: people)———-V(Head: eat)——————————–NP(Head: peanut)
        • ——people(Head: people)——eats(Head: eat)————-adj(Head: N/A)—————–N(Head: peanut)
        • —————————————————————–roasted(Head: N/A)————-peanuts(Head: peanut)
      • Should have: Pr[S (Head: eat) -> NP(Head: people) VP(Head: eat)] > Pr[ S (Head: eat) -> NP(Head: peanut) VP(Head: eat) ]
    • Dependency representation:
      • Sentence:
        • —————————eat
        • —————people—————peanuts
        • —————–the—————–roasted
      • Lexical (bottom up)
      • NP ->det N
  • Evaluation
    • Reference Reading:How Evaluation Guides AI Research
      • Intrinsic evaluation
      • Extrinsic evaluation
    • Kappa’s evaluation
    • Metric: precision recall
    • How to evaluate two structures which could generate the same sentence?
      • Answer: Generate more than one output for each input, convert the output into set of output, and use precision and recall to measure.
    • Reader evaluation:
      • If the reader’s score agree with the machine, stop
      • else: let another reader read the essay