- Noah Smith: squash network
- Main points:
- difference between LSA & SVD
- Bayesian graphical models
- informative priors are useful in the model
- Bayesian network
- X1X2…Xn
- Po(X1, X2, …, Xn)
- Generative story: HMM (dependencies)
- A and B are conditionally independent given C iff P(A,B|C) = P(A|C) * P(B|C)
- C (to A, to B)
- A B
- Example:
- Cloudy (to Sprinter and rain)
- Sprintder(to wet grass) Rain (to wet grass)
- Wet grass(final state)
w’s are observable
- State change graph: π -> S1 -> S2 -> S3 -> … -> St
- μ~(A,B,π)
- Have a initial π for C, π~Beta(Gamma(π1), Gamma(π2))
- C is Bernoulli (π)
- S~ Bernoulli(π(S | C))
- R~Bernoulli(π(S | C))
- W ~ Bernoulli(\tao (ω | S,R))
- π~Dirichlet(1)
- S1~Cat(π)S2~Cat(a_s1,1 , a_s1,2 , …. , a_(s1,n))* Cat is chosen from the transition matrix
- ω1~Cat(b_s1,1 , b_s1,2 , …. , b_(s1,n))ω2~Cat(b_s1,1 , b_s1,2 , …. , b_(s1,n))* Cat is chosen from the transition matrix
- What just been introduced is the unigram model, here is the bigram model
- P(s1, s2, …. , sn) = P(s1) P(s2|s1) P(s3|s1,s2) P(s4|s2,s3)..
- For EM, we need to recalculate everything, conditional distribution are different
- Some distributions:
- Binomial vs. Bernoulli
- Multinomial vs. discrete / categorical
- Document squashing
- π_ML = argmax P(X | π), P(y | X) ~= P(y | π_MLE)
- π_MAP = argmax P(π | X), P(y | X) ~= P(y | π_MAP)
- P(y|x) = ∫P(y|π) P(π|X) dπ
- To avoid integration, use Mento Carlo (random sample)
- E_p(Z) [f(Z)] = ∫ f(Z) p(Z) dZ = lim_(n->∞) 1/N * ∑(i = 1:N) (f(z(t))) = 1/T * ∑_(t = 1:T) f(Z(t))
- z(t)~p(Z)
- MCMC theory:
- z(0): random start (“state”)
- for t = 1, t > Tao, z(t+1) = g(z(t)), where g(z(t)) is the visits to states is promotional to p(z)
Gibbs Sampling
- Assume Z = <z1, z2, z3>
- Define Z’ = <z1′, z2′, z3′>
- new value: z1′ ~ P(z1 | z2, z3)
- new value: z2′ ~P(z2 | z1′ z3)
- new value: z3’~ P(z3 | z1′ z2′)
- Good reference:
- How do we get the original distribution?
- Use the model