CMSC773 – Page 2 – Xiaoxu Meng

Dependency Parsing
- Dependency: focuses on relations between words
  - Typed: Label indicating relationship between words
  - Untyped: Only which words depend
- Phrase structure: focuses on identifying phrases and their recursive structure
- Dependency parsing methods:
  - Shift reduce:
    - Predict from left to right
    - Fast, but slightly less accurate
    - MaltParser
  - Spanning tree:
    - Calculate full tree at once
    - Slightly more accurate, slower
    - MSTParser, Eda
  - Cascaded chunking
    - Chunk words into phrases, find heads, delete non-heads, repeat
    - CaboCha
- Example
  - Maximum Spanning Tree
    - Each dependency is an edge in a directed graph
    - Assign each edge a score
    - Keep the tree with the highest score
  - Cascaded Chunking
    
    For Japanese
  - Shift-Reduce
    - Two data Structures:
      - Queue: of unprocessed words
      - Stack: of partially processed words
    - Process: at each point, choose”
      - Shift: move one word from queue to stack
      - Reduce left: top word on stack is head of second word
      - Reduce right: second word on stack is head of top word
    - Classification for shift-reduce
      - We have a weight vector for “shift” “reduce left”,“reduce right”: Ws, Wl, Wr
      - Calculate feature functions from the queue and stack: φ(queue, stack)
      - Multiply the feature functions to get scores: Ss = Ws * φ(queue, stack)
      - take the highest score and choose the function
    - Features for Shift Reduce:
      - Features should generally cover at least the last stack entries and first queue entry
    - Algorithm:
      - Input:
        
        weights:Ws, Wl, Wr
        
        queue: [(1 word1, Pos1), (2, word2, Pos2), …]
        
        stack: [(0, ROOT, ROOT)]
      - Output:
        
        heads = [-1, head1, head2, …]
      - Pseudo Code
      - Training
        
        Can be trained using perceptron algorithm
        
        Do parsing, if correct answer, corr different from classifier answer ans, update weights
        
        Shift-reduce training algorithm
- CKY (Cocke-Kasami-Younger) Parsing Algorithm & Earley Parsing Algorithms
  - Definitions:
    - CFG (Context-free Grammar): G = (N, Σ, P, S)
      - N is a finite set of non-terminal symbols
      - Σ is a finite set of terminal symbols (tokens)
      - P is a finite set of productions (rules)
        
        Production : (α, β)∈P will be written α → β where α is in N and β is in (N∪Σ)*
      - S is the start symbol in N
    - CNF (Chomsky Normal Form):
      A context-free grammar where the right side of each production rule is restricted to be either two non terminals or one
      
      terminal. Production can be one of following formats: A->a || A->BC.
      
      Any CFG can be converted to a weakly equivalent grammar in CNF
  - Parsing Algorithms:
    - Should know:
      - CFGs are basis for describing (syntactic) structure of NL sentences
      - Parsing Algorithms are core of NL analysis systems
      - Recognition vs. Parsing:
        
        Recognition-deciding the membership in the language
        
        Parsing–Recognition+ producing a parse tree for it
      - Parsing is more “difficult” than recognition (time complexity)
      - Ambiguity-an input may have exponentially many parses.
    - Top-down VS Bottom-up
      - Top-down (goal driven): from start symbol down
      - Bottom-up (data driven): from the symbols up
    - Naive vs. dynamic programming
      - Naive: enumerate everything
      - Back tracking: try something, discard partial solutions
      - DP: save partial solutions in a table
    - CKY: Bottom-up DP
    - Earley Parsing: top-down DP
- Chart Parsing:
- Lambda Calculus
- Perceptron Learning
- Restricted Boltzmann Machines
- Statistical Hypothesis Testing
- Latent Semantic Analysis
- Latent Dirichlet Allocation
- Gibbs Sampling
- Inter-rater Reliability
- IBM Models for MT
- Phrase-based Statistical MT
- Hierarchical Phrase-based Statistical MT
- Lambda Reductions
- Probabilistic Soft Logic
- Open Information Extraction
- Semantic Role Labeling
- Combinatory Categorial Grammar (CCG)
- Tree Adjoining Grammar
- Abstract Meaning Representation (AMR)
- Wikification

Category: CMSC773

CMSC773: HW1

CMSC773: pre-coarse knowledge