CMSC773: HW1

• Question2:
• Word order:
• Explanation: Word order refers to the structure of a sentense:
• Alexa, when is your birthday?
• Alexa, when your birthday is?
• This will test whether alexa handles some wrong word order.
• Inflectional morphology:
• Explanation:
• Question 3:
• Question 4:
• Question 5:
• Question 6:
• New definitions:
• log-entropy weighting
• cosine similarity

CMSC773: pre-coarse knowledge

• Dependency Parsing
• Dependency: focuses on relations between words
• Typed: Label indicating relationship between words
• Untyped: Only which words depend
• Phrase structure: focuses on identifying phrases and their recursive structure
• Dependency parsing methods:
• Shift reduce:
• Predict from left to right
• Fast, but slightly less accurate
• MaltParser
• Spanning tree:
• Calculate full tree at once
• Slightly more accurate, slower
• MSTParser, Eda
• CaboCha
• Example
• Maximum Spanning Tree
• Each dependency is an edge in a directed graph
• Assign each edge a score
• Keep the tree with the highest score
For Japanese
• Shift-Reduce
• Two data Structures:
• Queue: of unprocessed words
• Stack: of partially processed words
• Process: at each point, choose”
• Shift: move one word from queue to stack
• Reduce left: top word on stack is head of second word
• Reduce right: second word on stack is head of top word
• Classification for shift-reduce
• We have a weight vector for “shift” “reduce left”,“reduce right”: Ws, Wl, Wr
• Calculate feature functions from the queue and stack: φ(queue, stack)
• Multiply the feature functions to get scores: Ss = Ws * φ(queue, stack)
• take the highest score and choose the function
• Features for Shift Reduce:
• Features should generally cover at least the last stack entries and first queue entry
• Algorithm:
• Input:
• weights:Ws, Wl, Wr
• queue: [(1 word1, Pos1), (2, word2, Pos2), …]
• stack: [(0, ROOT, ROOT)]
• Output:
• Pseudo Code
• Training
• Can be trained using perceptron algorithm
• Do parsing, if correct answer, corr different from classifier answer ans, update weights
• Shift-reduce training algorithm
• CKY (Cocke-Kasami-Younger) Parsing Algorithm & Earley Parsing Algorithms
• Definitions:
• CFG (Context-free Grammar): G = (N, Σ, P, S)
• N is a finite set of non-terminal symbols
• Σ is a finite set of terminal symbols (tokens)
• P is a finite set of productions (rules)
• Production : (α, β)∈P will be written α → β where α is in N and β is in (N∪Σ)*
• S is the start symbol in N
• CNF (Chomsky Normal Form):
A context-free grammar where the right side of each production rule is restricted to be either two non terminals or one
terminal. Production can be one of following formats: A->a || A->BC.

Any CFG can be converted to a weakly equivalent grammar in CNF
• Parsing Algorithms:
• Should know:
• CFGs are basis for describing (syntactic) structure of NL sentences
• Parsing Algorithms are core of NL analysis systems
• Recognition vs. Parsing:
• Recognition-deciding the membership in the language
• Parsing–Recognition+ producing a parse tree for it
• Parsing is more “difficult” than recognition (time complexity)
• Ambiguity-an input may have exponentially many parses.
• Top-down VS Bottom-up
• Top-down (goal driven): from start symbol down
• Bottom-up (data driven): from the symbols up
• Naive vs. dynamic programming
• Naive: enumerate everything
• Back tracking: try something, discard partial solutions
• DP: save partial solutions in a table
• CKY: Bottom-up DP
• Earley Parsing: top-down DP
• Chart Parsing:
• Lambda Calculus
• Perceptron Learning
• Restricted Boltzmann Machines
• Statistical Hypothesis Testing
• Latent Semantic Analysis
• Latent Dirichlet Allocation
• Gibbs Sampling
• Inter-rater Reliability
• IBM Models for MT
• Phrase-based Statistical MT
• Hierarchical Phrase-based Statistical MT
• Lambda Reductions
• Probabilistic Soft Logic
• Open Information Extraction
• Semantic Role Labeling
• Combinatory Categorial Grammar (CCG)