Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

    • watch the new talk and write summary
    • Noah Smith: squash network
    • Main points:
      • difference between LSA & SVD
      • Bayesian graphical models
        • informative priors are useful in the model
      • Bayesian network
        • DAG
        • X1X2…Xn
        • Po(X1, X2, …, Xn)
        • Generative story: HMM (dependencies)
        • A and B are conditionally independent given C iff P(A,B|C) = P(A|C) * P(B|C)
          •       C (to A, to B)
          •   A                             B
        • Example:
          •               Cloudy (to Sprinter and rain)
          • Sprintder(to wet grass)                   Rain (to wet grass)
          •                       Wet grass(final state)
          • w’s are observable
          • State change graph:  π -> S1 -> S2 -> S3 -> … -> St
          • w’s are observable
            • μ~(A,B,π)
            • Have a initial π for C, π~Beta(Gamma(π1), Gamma(π2))
            • C is Bernoulli (π)
            • S~ Bernoulli(π(S | C))
            • R~Bernoulli(π(S | C))
            • W ~ Bernoulli(\tao (ω | S,R))
            • π~Dirichlet(1)
            • S1~Cat(π)S2~Cat(a_s1,1  ,  a_s1,2  ,   ….  ,   a_(s1,n))* Cat is chosen from the transition matrix
            • ω1~Cat(b_s1,1  ,  b_s1,2  ,   ….  ,   b_(s1,n))ω2~Cat(b_s1,1  ,  b_s1,2  ,   ….  ,   b_(s1,n))* Cat is chosen from the transition matrix
        • What just been introduced is the unigram model, here is the bigram model
        • P(s1, s2, …. , sn) = P(s1) P(s2|s1) P(s3|s1,s2) P(s4|s2,s3)..
        • For EM, we need to recalculate everything, conditional distribution are different
      • Some distributions:
        • Binomial vs. Bernoulli
        • Multinomial vs. discrete / categorical
    • Document squashing

MCMC

  • X = HHHH TTTTTT
  • π_ML = argmax P(X | π),  P(y | X) ~= P(y | π_MLE)
  • π_MAP = argmax P(π | X), P(y | X) ~= P(y | π_MAP)
  • P(y|x) = ∫P(y|π) P(π|X) dπ
  • To avoid integration, use Mento Carlo (random sample)
    • E_p(Z) [f(Z)] = ∫ f(Z) p(Z) dZ = lim_(n->∞) 1/N * ∑(i = 1:N) (f(z(t))) = 1/T * ∑_(t = 1:T) f(Z(t))
      • z(t)~p(Z)
  • MCMC theory:
    • z(0): random start (“state”)
    • for t = 1, t > Tao, z(t+1) = g(z(t)), where g(z(t)) is the visits to states is promotional to p(z)

Gibbs Sampling

One thought on “Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models”

Leave a Reply

Your email address will not be published. Required fields are marked *