Lecture 8: Evaluation

  • Information about midterm
  • PCFG
    • Start with S
    • ∑Pr(A -> gamma | A) = 1
      • (conditional) probability of each item has to sum to one
    • Pr(O = o1,o2,…,on|µ)
      • HMM: Forward
      • PCFG: Inside-Outside
    • Guess Pr: argmax_(Z)[ Pr(Z|O, µ) ]
      • HMM:Use Viterbi to get
      • PCFG: Use Viterbi CKY to get
      • *Z is the best sequence of states
    • Guess µ: argmax_(µ)[Pr(O|µ)]
      • HMM:Use forward-backward to get
      • PCFG: Use Inside-outside to get
    • Example:
      • Sentence:
        • ——————-S
        • ——–NP—————-VP
        • ——–NP———-V————-NP
        • ——people——eats —–adj——–N
        • —————————roasted—-peanuts
      • Problem:
        • Pr_µ(peanuts eat roasted people) = Pr_µ(people eat roasted peanut)
      • We can try to generate head of each phrase:
        • ————————————S (Head: eat)
        • ——–NP(Head: people)—————————–VP(Head: eat)
        • ——–NP(Head: people)———-V(Head: eat)——————————–NP(Head: peanut)
        • ——people(Head: people)——eats(Head: eat)————-adj(Head: N/A)—————–N(Head: peanut)
        • —————————————————————–roasted(Head: N/A)————-peanuts(Head: peanut)
      • Should have: Pr[S (Head: eat) -> NP(Head: people) VP(Head: eat)] > Pr[ S (Head: eat) -> NP(Head: peanut) VP(Head: eat) ]
    • Dependency representation:
      • Sentence:
        • —————————eat
        • —————people—————peanuts
        • —————–the—————–roasted
      • Lexical (bottom up)
      • NP ->det N
  • Evaluation
    • Reference Reading:How Evaluation Guides AI Research
      • Intrinsic evaluation
      • Extrinsic evaluation
    • Kappa’s evaluation
    • Metric: precision recall
    • How to evaluate two structures which could generate the same sentence?
      • Answer: Generate more than one output for each input, convert the output into set of output, and use precision and recall to measure.
    • Reader evaluation:
      • If the reader’s score agree with the machine, stop
      • else: let another reader read the essay


  • Gensim Tutorial:
    • https://radimrehurek.com/gensim/models/word2vec.html
    • Use google-news model as pre-trained model
    • clustering based on distance matrix
    • Question: how do we do the clustering?
      • should cluster on the keywords?
      • should cluster on the keywords-related words?
  • Leg dissection demo:
    • 18 cameras 30frames 10G
    • 5 cameras 100 frames 6G
    • Question:
      • what is our task?
        • we cannot change focal length now. we can only change the viewpoint
        • if we want dynamic, we should have dynamic mesh?
  • Foveated ray-tracing:
    • input: eye ray + 1spp
    • output: foveated image
    • question: If we use foveated image as ground truth, what should be the denoising algorithm for the ground truth?
  • TODO:
    • read G3D code and change sample number
    • read papers (nvd, disney)
    • Homework

Lecture 6: Context-free parsing


  1. Generative Model P(X,Y)
  2. Discriminative model P(Y|X)


  1. Block sampler: Instead of sample one element at a time, we can sample a batch of samples in Gibbs Sampling.
  2. Lag and Burn-in: can be viewed as parameters (we can control the number of iterations)
    1. lag: mark some iterations in the loop as lag, then throw away the lag iterations, then the other samples become independent.
      1. Example: run 1000 iters -> run 100 lags -> run 1000 iters -> 100 lags …
    2. burn in: throw away the initial 10 or 20 iterations (burn-in iterations), where the model has not converged.
      1. The right way is to test whether the model has converged.
  3. Topic model:
  4. The sum of the parameter of each word in a topic doesn’t need to be one
  5. The derivative (branches) of LDA (Non-parametric model):
    1. Supervised LDA
    2. Chinese restaurant process (CRP)
    3. Hierarchy models
      1. example: SHLDA
      2. gun-safety (Clinton) &  gun-control (Trump)


Any context which can be processed with FSA can be processed with CFGs. But not vice versa.

? turning machine
 (Don’t cover in this lecture)
CSL Tree adjusting Grammar PTAGs
 (Don’t cover in this lecture)
PDA/CFGs Allow some negative examples. And can handle some cases that cannot be processed by FSA.

For example: S -> aSb {a^nb^n} cannot be processed by FSA because we need to know the variable n. But FSM only remember the states, it cannot count.

Regular FSA/regular expressions HMM


The rat that the cat that the dog bit chased died.


Sentence: The tall man loves Mary.







—DT-Adj—N             V——–NP

—the-tall–man       loves—–Mary

Example3: CKY Algorithm

0 The 1 tall 2 man 3 loves 4 Mary 5

[w, i, j] A->w \in G

[A, i, j]

Chart (bottom up parsing algorithm):

0 The 1 tall 2 man 3 loves 4 Mary 5

–Det —— N ———V ——NP—-

———–NP ———–VP ———

—————- S ———————

Then I have:

[B, i, j]

[C, j, k]

So I can have:

A->BC : [A, i, k] # BC are non-determinal phrases

NP ->Det N

VP -> V NP

S -> NP VP


I      saw         the        man       in        the       park        with      a       telescope.

——————– NP—————– PP ———–

–                         —————NP———————



A->B . CD

B: already

CD: predicted

[A -> a*ß, j]

[0, S’-> *S, 0]

scan: make progress through rules.

[i, A -> a* (w_j+1) ß, j]

A [the tall] B*C

i, [the tall], j

Prediction: top-down prediction B-> γ

[i, A-> a * Bß, j]

[j, B->*γ, j]



Complete (move the dot):

[i, A->a*ß, k] [k, B->γ, j]


I                          k                       j



Then I have:

[I, A->aB*ß, k]