Lecture 10: Neural Network

Deep learning Representation learning Rule-based high explainability Linguistic supervision Semi-supervision have small set of data with label has large set of data without label Recurrent-level supervision Language structure description lengths DL= size(lexicon) + size( encoding) lex1 do the kitty you like see Lex2 do you like see the kitty How to evaluate the two lexicons? … Read moreLecture 10: Neural Network

Word2Vec Models

Collection of all pre-trained Word2Vec Models: http://ahogrammer.com/2017/01/20/the-list-of-pretrained-word-embeddings/ Google’s model seems not reliable… Here are some similarity tests of Google’s model: The similarity between good and great is: 0.7291509541564205 The similarity between good and awesome is: 0.5240075080190216 The similarity between good and best is: 0.5467195232933185 The similarity between good and better is: 0.6120728804252082 The similarity between … Read moreWord2Vec Models

Lecture 8: Evaluation

Information about midterm PCFG Start with S ∑Pr(A -> gamma | A) = 1 (conditional) probability of each item has to sum to one Pr(O = o1,o2,…,on|µ) HMM: Forward PCFG: Inside-Outside Guess Pr: argmax_(Z)[ Pr(Z|O, µ) ] HMM:Use Viterbi to get PCFG: Use Viterbi CKY to get *Z is the best sequence of states Guess µ: argmax_(µ)[Pr(O|µ)] HMM:Use … Read moreLecture 8: Evaluation

Lecture 6: Context-free parsing

Questions: Generative Model P(X,Y) Discriminative model P(Y|X) MainPoints Block sampler: Instead of sample one element at a time, we can sample a batch of samples in Gibbs Sampling. Lag and Burn-in: can be viewed as parameters (we can control the number of iterations) lag: mark some iterations in the loop as lag, then throw away … Read moreLecture 6: Context-free parsing

Lecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

watch the new talk and write summary Noah Smith: squash network Main points: difference between LSA & SVD Bayesian graphical models informative priors are useful in the model Bayesian network DAG X1X2…Xn Po(X1, X2, …, Xn) Generative story: HMM (dependencies) A and B are conditionally independent given C iff P(A,B|C) = P(A|C) * P(B|C)   … Read moreLecture 5: Reduced-dimensionality representations for documents: Gibbs sampling and topic models

Lecture3: Information Theory

Today’s class is about: Hypothesis testing collocations Info theory Hypothesis Testing Last lecture covered the methodology. Collocation “religion war” PMI, PPMI PMI = pointwise mutual information PMI = log2(P(x,y)/(P(x)P(y))) = I(x,y) PPMI = positive PMI = max(0, PMI) Example: Hong Kong, the frequency of “Hong” and “Kong” are low, but the frequency for “Hong Kong” … Read moreLecture3: Information Theory

Lecture 2: Lexical association measures and hypothesis testing

Pre-lecture Readings. Lexical association Named entities: http://www.nltk.org/book/ch07.html Information extraction architecture raw text->sentence segmentation->takenization0<part of speech tagging->entity detection->relation detection chunking: segments and labels multi-token sequences as illustrated in 2.1. Noun-phrase (NP) chunking tag patterns: describe sequences of tagged words Chunking with Regular Expressions Exploring Text Corpora Chinking: define a chink to be a sequence of tokens that is not included in … Read moreLecture 2: Lexical association measures and hypothesis testing

CMSC773: HW1

Question2: Word order: Explanation: Word order refers to the structure of a sentense: Alexa, when is your birthday? (Alexa answers) Alexa, when your birthday is? (Alexa answers) This will test whether alexa handles some wrong word order. Inflectional morphology: Explanation: Question 3: Question 4: Reference: http://statweb.stanford.edu/~serban/116/bayes.pdf Question 5: Question 6: New definitions: log-entropy weighting cosine … Read moreCMSC773: HW1