 Dependency Parsing
 Dependency: focuses on relations between words
 Typed: Label indicating relationship between words
 Untyped: Only which words depend

Phrase structure: focuses on identifying phrases and their recursive structure
 Dependency parsing methods:
 Shift reduce:
 Predict from left to right
 Fast, but slightly less accurate
 MaltParser
 Spanning tree:
 Calculate full tree at once
 Slightly more accurate, slower
 MSTParser, Eda
 Cascaded chunking
 Chunk words into phrases, find heads, delete nonheads, repeat
 CaboCha
 Shift reduce:
 Example
 Maximum Spanning Tree
 Each dependency is an edge in a directed graph
 Assign each edge a score
 Keep the tree with the highest score

Cascaded ChunkingFor Japanese
 ShiftReduce
 Two data Structures:
 Queue: of unprocessed words
 Stack: of partially processed words
 Process: at each point, choose”
 Shift: move one word from queue to stack
 Reduce left: top word on stack is head of second word
 Reduce right: second word on stack is head of top word
 Classification for shiftreduce

We have a weight vector for “shift” “reduce left”,“reduce right”: Ws, Wl, Wr
 Calculate feature functions from the queue and stack: φ(queue, stack)
 Multiply the feature functions to get scores: Ss = Ws * φ(queue, stack)
 take the highest score and choose the function

 Features for Shift Reduce:

Features should generally cover at least the last stack entries and first queue entry

 Algorithm:
 Input:
 weights:Ws, Wl, Wr
 queue: [(1 word1, Pos1), (2, word2, Pos2), …]
 stack: [(0, ROOT, ROOT)]
 Output:
 heads = [1, head1, head2, …]
 Pseudo Code
 Training
 Can be trained using perceptron algorithm

Do parsing, if correct answer, corr different from classifier answer ans, update weights
 Shiftreduce training algorithm
 Input:
 Two data Structures:
 Maximum Spanning Tree
 CKY (CockeKasamiYounger) Parsing Algorithm & Earley Parsing Algorithms
 Definitions:
 CFG (Contextfree Grammar): G = (N, Σ, P, S)

N is a finite set of nonterminal symbols
 Σ is a finite set of terminal symbols (tokens)
 P is a finite set of productions (rules)

Production : (α, β)∈P will be written α → β where α is in N and β is in (N∪Σ)*

 S is the start symbol in N

 CNF (Chomsky Normal Form):
A contextfree grammar where the right side of each production rule is restricted to be either two non terminals or oneterminal. Production can be one of following formats: A>a  A>BC.Any CFG can be converted to a weakly equivalent grammar in CNF
 CFG (Contextfree Grammar): G = (N, Σ, P, S)
 Parsing Algorithms:
 Should know:

CFGs are basis for describing (syntactic) structure of NL sentences

Parsing Algorithms are core of NL analysis systems
 Recognition vs. Parsing:

Recognitiondeciding the membership in the language

Parsing–Recognition+ producing a parse tree for it


Parsing is more “difficult” than recognition (time complexity)

Ambiguityan input may have exponentially many parses.

 Topdown VS Bottomup
 Topdown (goal driven): from start symbol down
 Bottomup (data driven): from the symbols up
 Naive vs. dynamic programming
 Naive: enumerate everything
 Back tracking: try something, discard partial solutions
 DP: save partial solutions in a table
 CKY: Bottomup DP
 Earley Parsing: topdown DP
 Should know:
 Definitions:
 Chart Parsing:
 Lambda Calculus
 Perceptron Learning
 Restricted Boltzmann Machines
 Statistical Hypothesis Testing
 Latent Semantic Analysis
 Latent Dirichlet Allocation
 Gibbs Sampling
 Interrater Reliability
 IBM Models for MT
 Phrasebased Statistical MT
 Hierarchical Phrasebased Statistical MT
 Lambda Reductions
 Probabilistic Soft Logic
 Open Information Extraction
 Semantic Role Labeling
 Combinatory Categorial Grammar (CCG)
 Tree Adjoining Grammar
 Abstract Meaning Representation (AMR)
 Wikification
 Dependency: focuses on relations between words