Lecture 8: Evaluation

Information about midterm PCFG Start with S ∑Pr(A -> gamma | A) = 1 (conditional) probability of each item has to sum to one Pr(O = o1,o2,…,on|µ) HMM: Forward PCFG: Inside-Outside Guess Pr: argmax_(Z)[ Pr(Z|O, µ) ] HMM:Use Viterbi to get PCFG: Use Viterbi CKY to get *Z is the best sequence of states Guess µ: argmax_(µ)[Pr(O|µ)] HMM:Use … Read moreLecture 8: Evaluation

03102018

Gensim Tutorial: https://radimrehurek.com/gensim/models/word2vec.html Use google-news model as pre-trained model clustering based on distance matrix Question: how do we do the clustering? should cluster on the keywords? should cluster on the keywords-related words? Leg dissection demo: 18 cameras 30frames 10G 5 cameras 100 frames 6G Question: what is our task? we cannot change focal length now. … Read more03102018

Lecture 6: Context-free parsing

Questions: Generative Model P(X,Y) Discriminative model P(Y|X) MainPoints Block sampler: Instead of sample one element at a time, we can sample a batch of samples in Gibbs Sampling. Lag and Burn-in: can be viewed as parameters (we can control the number of iterations) lag: mark some iterations in the loop as lag, then throw away … Read moreLecture 6: Context-free parsing

Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

watch the new talk and write summary Noah Smith: squash network Main points: difference between LSA & SVD Bayesian graphical models informative priors are useful in the model Bayesian network DAG X1X2…Xn Po(X1, X2, …, Xn) Generative story: HMM (dependencies) A and B are conditionally independent given C iff P(A,B|C) = P(A|C) * P(B|C)   … Read moreLecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

Lecture3: Information Theory

Today’s class is about: Hypothesis testing collocations Info theory Hypothesis Testing Last lecture covered the methodology. Collocation “religion war” PMI, PPMI PMI = pointwise mutual information PMI = log2(P(x,y)/(P(x)P(y))) = I(x,y) PPMI = positive PMI = max(0, PMI) Example: Hong Kong, the frequency of “Hong” and “Kong” are low, but the frequency for “Hong Kong” … Read moreLecture3: Information Theory

The test speed of neural network?

Basically, the time spent on testing depends on: the complexity of the neural network For example, the fastest network should be the fully-connected network. CNN should be faster than LSTM because LSTM is sequential (sequential = slow) Currently, there are many ways to compress deep learning model (remove nodes with lighter weight) the complexity of … Read moreThe test speed of neural network?