- Information about midterm
- PCFG
- Start with S
- ∑Pr(A -> gamma | A) = 1
- (conditional) probability of each item has to sum to one

- Pr(O = o1,o2,…,on|µ)
- HMM: Forward
- PCFG: Inside-Outside

- Guess Pr: argmax_(Z)[ Pr(Z|O, µ) ]
- HMM:Use Viterbi to get
- PCFG: Use Viterbi CKY to get
- *Z is the best sequence of states

- Guess µ: argmax_(µ)[Pr(O|µ)]
- HMM:Use forward-backward to get

- PCFG: Use Inside-outside to get

- Example:
- Sentence:
- ——————-S
- ——–NP—————-VP
- ——–NP———-V————-NP
- ——people——eats —–adj——–N
- —————————roasted—-peanuts

- Problem:
- Pr_µ(peanuts eat roasted people) = Pr_µ(people eat roasted peanut)

- We can try to generate head of each phrase:
- ————————————S (Head: eat)
- ——–NP(Head: people)—————————–VP(Head: eat)
- ——–NP(Head: people)———-V(Head: eat)——————————–NP(Head: peanut)
- ——people(Head: people)——eats(Head: eat)————-adj(Head: N/A)—————–N(Head: peanut)
- —————————————————————–roasted(Head: N/A)————-peanuts(Head: peanut)

- Should have:
**Pr[**S (Head: eat) -> NP(Head: people) VP(Head: eat)**]****> Pr[**S (Head: eat) -> NP(Head: peanut) VP(Head: eat)**]**

- Sentence:
- Dependency representation:
- Sentence:
- —————————eat
- —————people—————peanuts
- —————–the—————–roasted

- Lexical (bottom up)
- NP ->det N

- Sentence:

- Evaluation
- Reference Reading:How Evaluation Guides AI Research
- Intrinsic evaluation
- Extrinsic evaluation

- Kappa’s evaluation
- Metric: precision recall
- How to evaluate two structures which could generate the same sentence?
- Answer: Generate more than one output for each input, convert the output into set of output, and use precision and recall to measure.

- Reader evaluation:
- If the reader’s score agree with the machine, stop
- else: let another reader read the essay

- Reference Reading:How Evaluation Guides AI Research

# Month: March 2018

## 03102018

- Gensim Tutorial:
- https://radimrehurek.com/gensim/models/word2vec.html
- Use google-news model as pre-trained model
- clustering based on distance matrix
- Question: how do we do the clustering?
- should cluster on the keywords?
- should cluster on the keywords-related words?

- Leg dissection demo:
- 18 cameras 30frames 10G
- 5 cameras 100 frames 6G
- Question:
- what is our task?
- we cannot change focal length now. we can only change the viewpoint
- if we want dynamic, we should have dynamic mesh?

- what is our task?

- Foveated ray-tracing:
- input: eye ray + 1spp
- output: foveated image
- question: If we use foveated image as ground truth, what should be the denoising algorithm for the ground truth?

- TODO:
- read G3D code and change sample number
- read papers (nvd, disney)
- Homework

## Lecture 6: Context-free parsing

Questions:

- Generative Model P(X,Y)
- Discriminative model P(Y|X)

MainPoints

- Block sampler: Instead of sample one element at a time, we can sample a batch of samples in Gibbs Sampling.
- Lag and Burn-in: can be viewed as parameters (we can control the number of iterations)
- lag: mark some iterations in the loop as lag, then throw away the lag iterations, then the other samples become independent.
- Example: run 1000 iters -> run 100 lags -> run 1000 iters -> 100 lags …

- burn in: throw away the initial 10 or 20 iterations (burn-in iterations), where the model has not converged.
- The right way is to test whether the model has converged.

- lag: mark some iterations in the loop as lag, then throw away the lag iterations, then the other samples become independent.
- Topic model:
- The sum of the parameter of each word in a topic doesn’t need to be one
- The derivative (branches) of LDA (Non-parametric model):
- Supervised LDA
- Chinese restaurant process (CRP)
- Hierarchy models
- example: SHLDA
- gun-safety (Clinton) & gun-control (Trump)

Parsing:

Any context which can be processed with FSA can be processed with CFGs. But not vice versa.

? | turning machine | |

(Don’t cover in this lecture) | ||

CSL | Tree adjusting Grammar | PTAGs |

(Don’t cover in this lecture) | ||

CF | PDA/CFGs | PCFGs |

PDA/CFGs Allow some negative examples. And can handle some cases that cannot be processed by FSA.
For example: S -> aSb {a^nb^n} cannot be processed by FSA because we need to know the variable n. But FSM only remember the states, it cannot count. |
||

Regular | FSA/regular expressions | HMM |

Example1:

The rat that the cat that the dog bit chased died.

Example2:

Sentence: The tall man loves Mary.

————-loves

—–man————Mary

-The——tall

Structure:

——————-S

——–NP——————VP

—DT-Adj—N V——–NP

—the-tall–man loves—–Mary

Example3: CKY Algorithm

0 The 1 tall 2 man 3 loves 4 Mary 5

[w, i, j] A->w \in G

[A, i, j]

Chart (bottom up parsing algorithm):

0 The 1 tall 2 man 3 loves 4 Mary 5

–Det —— N ———V ——NP—-

———–NP ———–VP ———

—————- S ———————

Then I have:

[B, i, j]

[C, j, k]

So I can have:

A->BC : [A, i, k] # BC are non-determinal phrases

NP ->Det N

VP -> V NP

S -> NP VP

Example4:

I saw the man in the park with a telescope.

——————– NP—————– PP ———–

– —————NP———————

————————NP—————————–

A->B . CD

B: already

CD: predicted

[A -> a*ß, j]

[0, S’-> *S, 0]

scan: make progress through rules.

[i, A -> a* (w_j+1) ß, j]

A [the tall] B*C

i, [the tall], j

Prediction: top-down prediction B-> γ

[i, A-> a * Bß, j]

[j, B->*γ, j]

Combine

Complete (move the dot):

[i, A->a*ß, k] [k, B->γ, j]

I k j

–[A->a*Bß]———[B->γ*]—–

Then I have:

[I, A->aB*ß, k]