Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

- watch the new talk and write summary
- Noah Smith: squash network
- Main points:
  - difference between LSA & SVD
  - Bayesian graphical models
    - informative priors are useful in the model
  - Bayesian network
    - DAG
    - X1X2…Xn
    - Po(X1, X2, …, Xn)
    - Generative story: HMM (dependencies)
    - A and B are conditionally independent given C iff P(A,B|C) = P(A|C) * P(B|C)
      - C (to A, to B)
      - A B
    - Example:
      - Cloudy (to Sprinter and rain)
      - Sprintder(to wet grass) Rain (to wet grass)
      - Wet grass(final state)
      - w’s are observable
      - State change graph: π -> S1 -> S2 -> S3 -> … -> St
      - w’s are observable
        
        μ~(A,B,π)
        
        Have a initial π for C, π~Beta(Gamma(π1), Gamma(π2))
        
        C is Bernoulli (π)
        
        S~ Bernoulli(π(S | C))
        
        R~Bernoulli(π(S | C))
        
        W ~ Bernoulli(\tao (ω | S,R))
        
        π~Dirichlet(1)
        
        S1~Cat(π)S2~Cat(a_s1,1 , a_s1,2 , …. , a_(s1,n))* Cat is chosen from the transition matrix
        
        ω1~Cat(b_s1,1 , b_s1,2 , …. , b_(s1,n))ω2~Cat(b_s1,1 , b_s1,2 , …. , b_(s1,n))* Cat is chosen from the transition matrix
    - What just been introduced is the unigram model, here is the bigram model
    - P(s1, s2, …. , sn) = P(s1) P(s2|s1) P(s3|s1,s2) P(s4|s2,s3)..
    - For EM, we need to recalculate everything, conditional distribution are different
  - Some distributions:
    - Binomial vs. Bernoulli
    - Multinomial vs. discrete / categorical
- Document squashing

MCMC

X = HHHH TTTTTT
π_ML = argmax P(X | π), P(y | X) ~= P(y | π_MLE)
π_MAP = argmax P(π | X), P(y | X) ~= P(y | π_MAP)
P(y|x) = ∫P(y|π) P(π|X) dπ
To avoid integration, use Mento Carlo (random sample)
- E_p(Z) [f(Z)] = ∫ f(Z) p(Z) dZ = lim_(n->∞) 1/N * ∑(i = 1:N) (f(z(t))) = 1/T * ∑_(t = 1:T) f(Z(t))
  - z(t)~p(Z)
MCMC theory:
- z(0): random start (“state”)
- for t = 1, t > Tao, z(t+1) = g(z(t)), where g(z(t)) is the visits to states is promotional to p(z)

Gibbs Sampling

Assume Z = <z1, z2, z3>
Define Z’ = <z1′, z2′, z3′>
- new value: z1′ ~ P(z1 | z2, z3)
- new value: z2′ ~P(z2 | z1′ z3)
- new value: z3’~ P(z3 | z1′ z2′)
Good reference:
- http://www2.stat.duke.edu/~rcs46/modern_bayes17/lecturesModernBayes17/lecture-7/07-gibbs.pdf
- http://www.umiacs.umd.edu/~resnik/pubs/gibbs.pdf
How do we get the original distribution?
- Use the model

Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

One thought on “Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models”

Leave a Reply Cancel reply